NOTE: Please be mindful that the log messages are logged across the code box that instantiates the object, so if a log message from a previous execution of the notebook does not match up with the markdown notes, please re-run the code again and look in the first box. I don’t know how to fix that, sorry!

Submit your samples validating them against a checklist

This notebook is an extension from the first notebook; here, we will focus on another aspect of a biosamples submission: validation against a checklist.

Checklists are the name that the validation rules for the samples receive. Currently, BioSamples is mostly tailored to 2 types of checklists: - BioSamples checklists: Accessioned as “BSDXXX” - ENA checklists: Accessioned as ERCXXX. Full list, with explanation on mandatory/required fields, can be found here

For the first steps, we will do the same as in the previous notebook: Set up the required components, and create a sample.

[3]:
## Import everything we need
from biobroker.authenticator import WebinAuthenticator # Biosamples uses the WebinAuthenticator
from biobroker.api import BsdApi # BioSamples Database (BSD) API
from biobroker.metadata_entity import Biosample # The metadata entity
from biobroker.input_processor import TsvInputProcessor # An input processor
from biobroker.output_processor import XlsxOutputProcessor # An output processor
import os

## Generate sample
sample_tsv = [
    ["name", "collected_at", "organism", "release"],
    ["sumple", "noon", "Homo sapiens", "2024-07-10"]
]

writable_sample = "\n".join(["\t".join(row) for row in sample_tsv])
with open("simple_sample_sumple.tsv", "w") as f:
    f.write(writable_sample)

path = "simple_sample_sumple.tsv" # This is the file we created previously

## Set up the required entities

input_processor = TsvInputProcessor(input_data=path)
my_sample = input_processor.process(Biosample)

os.environ['API_ENVIRONMENT'] = "dev" # There are multiple ways to set up environment variables

username = "" # Your username goes here
password = "" # Your password goes here
authenticator = WebinAuthenticator(username=username, password=password)

api = BsdApi(authenticator=authenticator)

Choosing a checklist and validating

After everything is set-up, we need to choose one of the checklist for validation. Let’s say we browsed the available checklists, and since our sample is of human skin, we want to align with the GSC MIxS Human Skin checklist (https://www.ebi.ac.uk/ena/browser/view/ERC000017).

In BioSamples, checklists are chosen by attaching the accesion of the list as metadata to the sample. Let’s start by adding just the checklist and trying to submit, to see what happens:

[4]:
my_sample[0]['checklist'] = "ERC000017"
api.submit(my_sample)
---------------------------------------------------------------------------
BiosamplesValidationError                 Traceback (most recent call last)
Cell In[4], line 2
      1 my_sample[0]['checklist'] = "ERC000017"
----> 2 api.submit(my_sample)

File ~/PycharmProjects/biobroker/biobroker/api/api.py:51, in GenericApi.submit(self, entities, **kwargs)
     49     return self._submit_multiple(entities, kwargs)
     50 else:
---> 51     return [self._submit(entities[0], kwargs)]

File ~/PycharmProjects/biobroker/biobroker/api/api.py:172, in BsdApi._submit(self, entity, kwargs)
    170 r = self.authenticator.post(submit_url, payload=entity.entity)
    171 if r.status_code > 300:
--> 172     self._submit_errors(r)
    173 return Biosample(r.json())

File ~/PycharmProjects/biobroker/biobroker/api/api.py:390, in BsdApi._submit_errors(self, response)
    388 if response.status_code == 400:
    389     if "dataPath" in response.text:
--> 390         raise BiosamplesValidationError(response.text, self.logger)
    391     else:
    392         raise BiosamplesNoErrorMessageError(response.status_code, self.logger)

BiosamplesValidationError: Found following errors in sample validation:
        - /characteristics.project name: should have required property 'project name'
        - /characteristics.collection date: should have required property 'collection date'
        - /characteristics.geographic location (country and/or sea): should have required property 'geographic location (country and/or sea)'
        - /characteristics.geographic location (latitude): should have required property 'geographic location (latitude)'
        - /characteristics.geographic location (longitude): should have required property 'geographic location (longitude)'
        - /characteristics.broad-scale environmental context: should have required property 'broad-scale environmental context'
        - /characteristics.local environmental context: should have required property 'local environmental context'
        - /characteristics.environmental medium: should have required property 'environmental medium')

As we can see, a custom python Exception is raised when we try to do this; This is the biobroker’s way of raising the checklist errors. It’s complaining about a bunch of missing fields, so let’s fix that!

(Please note: For the purpose of this notebook, we will do it here, but it would be much, MUCH easier to just fix the input tsv or xlsx file and load the samples again)

[5]:
my_sample[0]['project name'] = "Your fake project"
my_sample[0]['organism'] = "Homo sapiens"
my_sample[0]['collection date'] = "2024-09-01"
my_sample[0]['geographic location (country and/or sea)'] = "Mushroom kingdom"
my_sample[0]['geographic location (latitude)'] = 1.2234
my_sample[0]['geographic location (longitude)'] = 7.21
my_sample[0]['broad-scale environmental context'] = "United Kingdom weather"
my_sample[0]['local environmental context'] = "Mostly rainy"
my_sample[0]['environmental medium'] = "Please read my plant"

api.submit(my_sample)
---------------------------------------------------------------------------
BiosamplesValidationError                 Traceback (most recent call last)
Cell In[5], line 11
      8 my_sample[0]['local environmental context'] = "Mostly rainy"
      9 my_sample[0]['environmental medium'] = "Please read my plant"
---> 11 api.submit(my_sample)

File ~/PycharmProjects/biobroker/biobroker/api/api.py:51, in GenericApi.submit(self, entities, **kwargs)
     49     return self._submit_multiple(entities, kwargs)
     50 else:
---> 51     return [self._submit(entities[0], kwargs)]

File ~/PycharmProjects/biobroker/biobroker/api/api.py:172, in BsdApi._submit(self, entity, kwargs)
    170 r = self.authenticator.post(submit_url, payload=entity.entity)
    171 if r.status_code > 300:
--> 172     self._submit_errors(r)
    173 return Biosample(r.json())

File ~/PycharmProjects/biobroker/biobroker/api/api.py:390, in BsdApi._submit_errors(self, response)
    388 if response.status_code == 400:
    389     if "dataPath" in response.text:
--> 390         raise BiosamplesValidationError(response.text, self.logger)
    391     else:
    392         raise BiosamplesNoErrorMessageError(response.status_code, self.logger)

BiosamplesValidationError: Found following errors in sample validation:
        - geographic location (country and~1or sea)/0/text: should be equal to one of the allowed values: ["Afghanistan","Albania","Algeria","American Samoa","Andorra","Angola","Anguilla","Antarctica","Antigua and Barbuda","Arctic Ocean","Argentina","Armenia","Aruba","Ashmore and Cartier Islands","Atlantic Ocean","Australia","Austria","Azerbaijan","Bahamas","Bahrain","Baker Island","Baltic Sea","Bangladesh","Barbados","Bassas da India","Belarus","Belgium","Belize","Benin","Bermuda","Bhutan","Bolivia","Borneo","Bosnia and Herzegovina","Botswana","Bouvet Island","Brazil","British Virgin Islands","Brunei","Bulgaria","Burkina Faso","Burundi","Cambodia","Cameroon","Canada","Cape Verde","Cayman Islands","Central African Republic","Chad","Chile","China","Christmas Island","Clipperton Island","Cocos Islands","Colombia","Comoros","Cook Islands","Coral Sea Islands","Costa Rica","Cote d'Ivoire","Croatia","Cuba","Curacao","Cyprus","Czechia","Democratic Republic of the Congo","Denmark","Djibouti","Dominica","Dominican Republic","East Timor","Ecuador","Egypt","El Salvador","Equatorial Guinea","Eritrea","Estonia","Ethiopia","Europa Island","Falkland Islands (Islas Malvinas)","Faroe Islands","Fiji","Finland","France","French Guiana","French Polynesia","French Southern and Antarctic Lands","Gabon","Gambia","Gaza Strip","Georgia","Germany","Ghana","Gibraltar","Glorioso Islands","Greece","Greenland","Grenada","Guadeloupe","Guam","Guatemala","Guernsey","Guinea","Guinea-Bissau","Guyana","Haiti","Heard Island and McDonald Islands","Honduras","Hong Kong","Howland Island","Hungary","Iceland","India","Indian Ocean","Indonesia","Iran","Iraq","Ireland","Isle of Man","Israel","Italy","Jamaica","Jan Mayen","Japan","Jarvis Island","Jersey","Johnston Atoll","Jordan","Juan de Nova Island","Kazakhstan","Kenya","Kerguelen Archipelago","Kingman Reef","Kiribati","Kosovo","Kuwait","Kyrgyzstan","Laos","Latvia","Lebanon","Lesotho","Liberia","Libya","Liechtenstein","Lithuania","Luxembourg","Macau","Macedonia","Madagascar","Malawi","Malaysia","Maldives","Mali","Malta","Marshall Islands","Martinique","Mauritania","Mauritius","Mayotte","Mediterranean Sea","Mexico","Micronesia","Midway Islands","Moldova","Monaco","Mongolia","Montenegro","Montserrat","Morocco","Mozambique","Myanmar","Namibia","Nauru","Navassa Island","Nepal","Netherlands","New Caledonia","New Zealand","Nicaragua","Niger","Nigeria","Niue","Norfolk Island","North Korea","North Sea","Northern Mariana Islands","Norway","Oman","Pacific Ocean","Pakistan","Palau","Palmyra Atoll","Panama","Papua New Guinea","Paracel Islands","Paraguay","Peru","Philippines","Pitcairn Islands","Poland","Portugal","Puerto Rico","Qatar","Republic of the Congo","Reunion","Romania","Ross Sea","Russia","Rwanda","Saint Helena","Saint Kitts and Nevis","Saint Lucia","Saint Pierre and Miquelon","Saint Vincent and the Grenadines","Samoa","San Marino","Sao Tome and Principe","Saudi Arabia","Senegal","Serbia","Seychelles","Sierra Leone","Singapore","Sint Maarten","Slovakia","Slovenia","Solomon Islands","Somalia","South Africa","South Georgia and the South Sandwich Islands","South Korea","Southern Ocean","Spain","Spratly Islands","Sri Lanka","Sudan","Suriname","Svalbard","Swaziland","Sweden","Switzerland","Syria","Taiwan","Tajikistan","Tanzania","Tasman Sea","Thailand","Togo","Tokelau","Tonga","Trinidad and Tobago","Tromelin Island","Tunisia","Turkey","Turkmenistan","Turks and Caicos Islands","Tuvalu","USA","Uganda","Ukraine","United Arab Emirates","United Kingdom","Uruguay","Uzbekistan","Vanuatu","Venezuela","Viet Nam","Virgin Islands","Wake Island","Wallis and Futuna","West Bank","Western Sahara","Yemen","Zambia","Zimbabwe","missing: control sample","missing: data agreement established pre-2023","missing: endangered species","missing: human-identifiable","missing: lab stock","missing: sample group","missing: synthetic construct","missing: third party data","not applicable","not collected","not provided","restricted access"])

What do you mean Mushroom Kingdom is not an accepted country or sea? ugh

[6]:
my_sample[0]['geographic location (country and/or sea)'] = "Mediterranean Sea"

new_sample = api.submit(my_sample)

And it’s done!! Let’s see what it looks like:

[8]:
print(f"https://wwwdev.ebi.ac.uk/biosamples/samples/{new_sample[0]['accession']}")
https://wwwdev.ebi.ac.uk/biosamples/samples/SAMEA131421914
[ ]: