Retrieving a sample
Let’s say that, following the previous tutorials, you have already submitted your samples. You’re happy and comptent, life is good at your longevous 14 years of age.
However, a couple months later, you don’t remember your sample’s accessions. You probably saved the output somewhere, but computers are tricky and they usually delete/hide the stuff you totally saved in a safe location.
Well, you don’t need to worry! Biosamples offers a search service based either on free text or attribute filtering, making it possible to retrieve your samples at any point, even if they’re private to the general public (As long as you’re using the same account to retrieve them).
In this notebook, we will try to retrieve a couple of samples by using both logics, just to show how it’s done. We will use as an example samples from the MICROBE consortia.
Setting up
As alwasy, we need to set-up a couple of things. Not that many this time, though!
[2]:
from biobroker.authenticator import WebinAuthenticator # Biosamples uses the WebinAuthenticator
from biobroker.api import BsdApi # BioSamples Database (BSD) API
username = "" # Your username goes here
password = "" # Your password goes here
authenticator = WebinAuthenticator(username=username, password=password)
api = BsdApi(authenticator=authenticator)
2024-10-14 10:42:46,219 - BsdApi - INFO - Set up BSD API successfully: using base uri 'https://www.ebi.ac.uk/biosamples/samples'
Retrieving samples: 100% (226/226) 🀱 Time: 0:00:09
Retrieving samples: 100% (77/77) 🀱 Time: 0:00:02
2024-10-14 10:44:31,082 - BsdApi - INFO - Trying to retrieve sample with accession SAMEA115657829
Using attributes
We will start loading the attributes. Let’s say, from your samples, you remember that you set-up certain attributes; in this case, from the MICROBE samples, I remember setting up project name: MICROBE, biome: soil, and center: HMGU. Let’s put that to search!
[3]:
attributes_to_search = {
'project name': 'MICROBE',
'biome': 'soil',
'center': 'HMGU'
}
my_samples = api.search_samples(attributes=attributes_to_search)
Attributes are always provided in a key:value pair manner. What happens behind the scenes is not that important, since the BsdApi object handles everything, but this dictionary is transformed into a query that is then requested to a BioSamples endpoint.
Please note that, depending on the number of samples, it may take a while to search. It is not displayed in the notebook, but I added a cool progress bar for impatient people! (For me, mostly).
Let’s see how many samples we got, and a teaser of the content!
[4]:
print(len(my_samples))
print(my_samples[0].entity)
226
{'characteristics': {'Effective(%)': [{'text': '98.9'}], 'Error(%)': [{'text': '0.01'}], 'GC(%)': [{'text': '64.58'}], 'Library_Flowcell_Lane': [{'text': 'MKDN240001513-1A_22GYCMLT3_L5'}], 'Q20(%)': [{'text': '97.75'}], 'Q30(%)': [{'text': '93.65'}], 'Raw data': [{'text': '5547457200'}], 'Raw reads': [{'text': '36983048'}], 'SRA accession': [{'text': 'ERS20120911'}], 'analysis date': [{'text': '2023-07-01T00:00:00Z'}], 'biome': [{'text': 'soil'}], 'biome.1': [{'text': 'soil'}], 'broad-scale environmental context': [{'text': 'temperate biome'}], 'center': [{'text': 'HMGU'}], 'checklist': [{'text': 'ERC000022'}], 'collection date': [{'text': '2023-05-01T00:00:00Z'}], 'cryoprotectant': [{'text': 'none'}], 'cultivation': [{'text': 'not provided'}], 'depth': [{'text': 'not provided'}], 'elevation': [{'text': 'not provided'}], 'environmental medium': [{'text': 'Bulk soil'}], 'freezing method': [{'text': 'not provided'}], 'geographic location (country and/or sea)': [{'text': 'Germany'}], 'geographic location (latitude)': [{'text': '51.07'}], 'geographic location (longitude)': [{'text': '10.47'}], 'geographic location (region and locality)': [{'text': 'Hainich'}], 'local environmental context': [{'text': 'Grassland soil'}], 'organism': [{'text': 'soil metagenome'}], 'plot_id': [{'text': 'HEG19'}], 'plot_id_BExIS': [{'text': 'H20606'}], 'preservation temperature': [{'text': 'not provided'}], 'project name': [{'text': 'MICROBE'}], 'sample_id_byYear': [{'text': 'HEG19_23'}], 'soil_type': [{'text': 'Stagnosol'}], 'time point': [{'text': 'T1'}], 'year': [{'text': '2023'}]}, 'name': 'metag_HEG19_2023', 'accession': 'SAMEA115657902', 'release': '2024-05-24T14:09:02Z', 'sraAccession': 'ERS20120911', 'webinSubmissionAccountId': 'Webin-67007', 'taxId': 410658, 'status': 'PRIVATE', 'update': '2024-05-24T13:16:50.340Z', 'submitted': '2024-05-24T13:16:50.340Z', 'submittedVia': 'JSON_API', 'create': '2024-05-24T13:16:50.340Z', '_links': {'self': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657902'}, 'curationDomain': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657902{?curationdomain}', 'templated': True}, 'curationLinks': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657902/curationlinks'}, 'curationLink': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657902/curationlinks/{hash}', 'templated': True}, 'structuredData': {'href': 'https://www.ebi.ac.uk/biosamples/structureddata/SAMEA115657902'}}}
## Using free text
Sometimes, unfortunately, you won’t remember what attributes you set up on your samples to identify them (BAD SCIENTIST! BAD! no treats today)
For this, BioSamples also provides with a free text search. For more information, you can take a look at the BioSamples guide on what kind of advanced search tricks you can use to make it simpler.
For this, let’s say that I remember that I put, somewhere AEG19_23. Let’s make the query!
[5]:
query = 'AEG19_23'
my_samples_free_text = api.search_samples(text=query)
[6]:
print(len(my_samples_free_text))
print(my_samples_free_text[0].entity)
77
{'characteristics': {'Effective(%)': [{'text': '99.16'}], 'Error(%)': [{'text': '0.01'}], 'GC(%)': [{'text': '63.0'}], 'Library_Flowcell_Lane': [{'text': 'MKDN240001563-1A_22GYCMLT3_L5'}], 'Q20(%)': [{'text': '98.18'}], 'Q30(%)': [{'text': '94.85'}], 'Raw data': [{'text': '6378478200'}], 'Raw reads': [{'text': '42523188'}], 'SRA accession': [{'text': 'ERS20120838'}], 'analysis date': [{'text': '2023-07-01T00:00:00Z'}], 'biome': [{'text': 'soil'}], 'biome.1': [{'text': 'soil'}], 'broad-scale environmental context': [{'text': 'temperate biome'}], 'center': [{'text': 'HMGU'}], 'checklist': [{'text': 'ERC000022'}], 'collection date': [{'text': '2023-05-01T00:00:00Z'}], 'cryoprotectant': [{'text': 'none'}], 'cultivation': [{'text': 'not provided'}], 'depth': [{'text': 'not provided'}], 'elevation': [{'text': 'not provided'}], 'environmental medium': [{'text': 'Bulk soil'}], 'freezing method': [{'text': 'not provided'}], 'geographic location (country and/or sea)': [{'text': 'Germany'}], 'geographic location (latitude)': [{'text': '48.4'}], 'geographic location (longitude)': [{'text': '9.45'}], 'geographic location (region and locality)': [{'text': 'Alb'}], 'local environmental context': [{'text': 'Grassland soil'}], 'organism': [{'text': 'soil metagenome'}], 'plot_id': [{'text': 'AEG19'}], 'plot_id_BExIS': [{'text': 'A35463'}], 'preservation temperature': [{'text': 'not provided'}], 'project name': [{'text': 'MICROBE'}], 'sample_id_byYear': [{'text': 'AEG19_23'}], 'soil_type': [{'text': 'Leptosol'}], 'time point': [{'text': 'T1'}], 'year': [{'text': '2023'}]}, 'name': 'metag_AEG19_2023', 'accession': 'SAMEA115657829', 'release': '2024-05-24T14:09:02Z', 'sraAccession': 'ERS20120838', 'webinSubmissionAccountId': 'Webin-67007', 'taxId': 410658, 'status': 'PRIVATE', 'update': '2024-05-24T13:16:38.909Z', 'submitted': '2024-05-24T13:16:38.909Z', 'submittedVia': 'JSON_API', 'create': '2024-05-24T13:16:38.909Z', '_links': {'self': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657829'}, 'curationDomain': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657829{?curationdomain}', 'templated': True}, 'curationLinks': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657829/curationlinks'}, 'curationLink': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657829/curationlinks/{hash}', 'templated': True}, 'structuredData': {'href': 'https://www.ebi.ac.uk/biosamples/structureddata/SAMEA115657829'}}}
Sincerely, I really do not like the free text search. It doesn’t really work as intended, with complex searches resulting most of the time in nothing at all (Either that or I am really stoopid, but… yeah probably is the second one).
In any case, I always recommend relying on attributes
## Using an accession
You can also retrieve the samples by using an accession; this is usually the easiest, and it’s the function that must be defined for all the API entities.
[7]:
my_sample = api.retrieve(['SAMEA115657829'])
print(len(my_sample))
print(my_sample[0].entity)
1
{'characteristics': {'Effective(%)': [{'text': '99.16'}], 'Error(%)': [{'text': '0.01'}], 'GC(%)': [{'text': '63.0'}], 'Library_Flowcell_Lane': [{'text': 'MKDN240001563-1A_22GYCMLT3_L5'}], 'Q20(%)': [{'text': '98.18'}], 'Q30(%)': [{'text': '94.85'}], 'Raw data': [{'text': '6378478200'}], 'Raw reads': [{'text': '42523188'}], 'SRA accession': [{'text': 'ERS20120838'}], 'analysis date': [{'text': '2023-07-01T00:00:00Z'}], 'biome': [{'text': 'soil'}], 'biome.1': [{'text': 'soil'}], 'broad-scale environmental context': [{'text': 'temperate biome'}], 'center': [{'text': 'HMGU'}], 'checklist': [{'text': 'ERC000022'}], 'collection date': [{'text': '2023-05-01T00:00:00Z'}], 'cryoprotectant': [{'text': 'none'}], 'cultivation': [{'text': 'not provided'}], 'depth': [{'text': 'not provided'}], 'elevation': [{'text': 'not provided'}], 'environmental medium': [{'text': 'Bulk soil'}], 'freezing method': [{'text': 'not provided'}], 'geographic location (country and/or sea)': [{'text': 'Germany'}], 'geographic location (latitude)': [{'text': '48.4'}], 'geographic location (longitude)': [{'text': '9.45'}], 'geographic location (region and locality)': [{'text': 'Alb'}], 'local environmental context': [{'text': 'Grassland soil'}], 'organism': [{'text': 'soil metagenome'}], 'plot_id': [{'text': 'AEG19'}], 'plot_id_BExIS': [{'text': 'A35463'}], 'preservation temperature': [{'text': 'not provided'}], 'project name': [{'text': 'MICROBE'}], 'sample_id_byYear': [{'text': 'AEG19_23'}], 'soil_type': [{'text': 'Leptosol'}], 'time point': [{'text': 'T1'}], 'year': [{'text': '2023'}]}, 'name': 'metag_AEG19_2023', 'accession': 'SAMEA115657829', 'release': '2024-05-24T14:09:02Z', 'sraAccession': 'ERS20120838', 'webinSubmissionAccountId': 'Webin-67007', 'taxId': 410658, 'status': 'PRIVATE', 'update': '2024-05-24T13:16:38.909Z', 'submitted': '2024-05-24T13:16:38.909Z', 'submittedVia': 'JSON_API', 'create': '2024-05-24T13:16:38.909Z', '_links': {'self': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657829'}, 'curationDomain': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657829{?curationdomain}', 'templated': True}, 'curationLinks': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657829/curationlinks'}, 'curationLink': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657829/curationlinks/{hash}', 'templated': True}, 'structuredData': {'href': 'https://www.ebi.ac.uk/biosamples/structureddata/SAMEA115657829'}}}
Please note you can provide multiple accessions as elements of the array
[ ]: