{ "cells": [ { "cell_type": "markdown", "id": "fd7cce00-7d8a-4c92-9113-81475e469aa5", "metadata": {}, "source": [ "# Retrieving a sample\n", "\n", "Let's say that, following the previous tutorials, you have already submitted your samples. You're happy and comptent, life is good at your longevous 14 years of age.\n", "\n", "However, a couple months later, you don't remember your sample's accessions. You probably saved the output somewhere, but computers are tricky and they usually delete/hide the stuff you totally saved in a safe location.\n", "\n", "Well, you don't need to worry! Biosamples offers a search service based either on `free text` or `attribute` filtering, making it possible to retrieve your samples at any point, even if they're private to the general public (As long as you're using the same account to retrieve them).\n", "\n", "In this notebook, we will try to retrieve a couple of samples by using both logics, just to show how it's done. We will use as an example samples from the MICROBE consortia.\n", "\n", "## Setting up\n", "\n", "As alwasy, we need to set-up a couple of things. Not that many this time, though!" ] }, { "cell_type": "code", "execution_count": 2, "id": "28298687-da69-4dea-a69e-17c7c40c918b", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-10-14 10:42:46,219 - BsdApi - INFO - Set up BSD API successfully: using base uri 'https://www.ebi.ac.uk/biosamples/samples'\n", "Retrieving samples: \u001b[38;2;0;255;0m100%\u001b[39m (226/226) 🀱 Time: 0:00:09 \n", "Retrieving samples: \u001b[38;2;0;255;0m100%\u001b[39m (77/77) 🀱 Time: 0:00:02 \n", "2024-10-14 10:44:31,082 - BsdApi - INFO - Trying to retrieve sample with accession SAMEA115657829\n" ] } ], "source": [ "from biobroker.authenticator import WebinAuthenticator # Biosamples uses the WebinAuthenticator\n", "from biobroker.api import BsdApi # BioSamples Database (BSD) API\n", "\n", "username = \"\" # Your username goes here\n", "password = \"\" # Your password goes here\n", "authenticator = WebinAuthenticator(username=username, password=password)\n", "\n", "api = BsdApi(authenticator=authenticator)" ] }, { "cell_type": "markdown", "id": "fac5c1fa-acc9-46a5-85ac-99d8504f8be3", "metadata": {}, "source": [ "## Using attributes\n", "\n", "We will start loading the attributes. Let's say, from your samples, you remember that you set-up certain attributes; in this case, from the MICROBE samples, I remember setting up `project name`: `MICROBE`, `biome`: `soil`, and `center`: `HMGU`. Let's put that to search!" ] }, { "cell_type": "code", "execution_count": 3, "id": "36224d7b-bc18-4bbf-b2b8-507dba810706", "metadata": {}, "outputs": [], "source": [ "attributes_to_search = {\n", " 'project name': 'MICROBE',\n", " 'biome': 'soil',\n", " 'center': 'HMGU'\n", "}\n", "my_samples = api.search_samples(attributes=attributes_to_search)" ] }, { "cell_type": "markdown", "id": "40c39645-2107-4fb9-9647-64c6052fec6d", "metadata": {}, "source": [ "Attributes are always provided in a key:value pair manner. What happens behind the scenes is not that important, since the `BsdApi` object handles everything, but this dictionary is transformed into a query that is then requested to a BioSamples endpoint.\n", "\n", "Please note that, depending on the number of samples, it may take a while to search. It is not displayed in the notebook, but I added a cool progress bar for impatient people! (For me, mostly).\n", "\n", "Let's see how many samples we got, and a teaser of the content!" ] }, { "cell_type": "code", "execution_count": 4, "id": "cf6cffb6-5c8b-4200-8145-069367112ab6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "226\n", "{'characteristics': {'Effective(%)': [{'text': '98.9'}], 'Error(%)': [{'text': '0.01'}], 'GC(%)': [{'text': '64.58'}], 'Library_Flowcell_Lane': [{'text': 'MKDN240001513-1A_22GYCMLT3_L5'}], 'Q20(%)': [{'text': '97.75'}], 'Q30(%)': [{'text': '93.65'}], 'Raw data': [{'text': '5547457200'}], 'Raw reads': [{'text': '36983048'}], 'SRA accession': [{'text': 'ERS20120911'}], 'analysis date': [{'text': '2023-07-01T00:00:00Z'}], 'biome': [{'text': 'soil'}], 'biome.1': [{'text': 'soil'}], 'broad-scale environmental context': [{'text': 'temperate biome'}], 'center': [{'text': 'HMGU'}], 'checklist': [{'text': 'ERC000022'}], 'collection date': [{'text': '2023-05-01T00:00:00Z'}], 'cryoprotectant': [{'text': 'none'}], 'cultivation': [{'text': 'not provided'}], 'depth': [{'text': 'not provided'}], 'elevation': [{'text': 'not provided'}], 'environmental medium': [{'text': 'Bulk soil'}], 'freezing method': [{'text': 'not provided'}], 'geographic location (country and/or sea)': [{'text': 'Germany'}], 'geographic location (latitude)': [{'text': '51.07'}], 'geographic location (longitude)': [{'text': '10.47'}], 'geographic location (region and locality)': [{'text': 'Hainich'}], 'local environmental context': [{'text': 'Grassland soil'}], 'organism': [{'text': 'soil metagenome'}], 'plot_id': [{'text': 'HEG19'}], 'plot_id_BExIS': [{'text': 'H20606'}], 'preservation temperature': [{'text': 'not provided'}], 'project name': [{'text': 'MICROBE'}], 'sample_id_byYear': [{'text': 'HEG19_23'}], 'soil_type': [{'text': 'Stagnosol'}], 'time point': [{'text': 'T1'}], 'year': [{'text': '2023'}]}, 'name': 'metag_HEG19_2023', 'accession': 'SAMEA115657902', 'release': '2024-05-24T14:09:02Z', 'sraAccession': 'ERS20120911', 'webinSubmissionAccountId': 'Webin-67007', 'taxId': 410658, 'status': 'PRIVATE', 'update': '2024-05-24T13:16:50.340Z', 'submitted': '2024-05-24T13:16:50.340Z', 'submittedVia': 'JSON_API', 'create': '2024-05-24T13:16:50.340Z', '_links': {'self': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657902'}, 'curationDomain': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657902{?curationdomain}', 'templated': True}, 'curationLinks': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657902/curationlinks'}, 'curationLink': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657902/curationlinks/{hash}', 'templated': True}, 'structuredData': {'href': 'https://www.ebi.ac.uk/biosamples/structureddata/SAMEA115657902'}}}\n" ] } ], "source": [ "print(len(my_samples))\n", "print(my_samples[0].entity)" ] }, { "cell_type": "markdown", "id": "c713132e-4c43-44c5-bc56-2b458691eaef", "metadata": {}, "source": [ "## Using free text\n", "\n", "Sometimes, unfortunately, you won't remember what attributes you set up on your samples to identify them (BAD SCIENTIST! BAD! no treats today)\n", "\n", "For this, BioSamples also provides with a free text search. For more information, you can take a look at the [BioSamples guide](https://www.ebi.ac.uk/biosamples/docs/guides/search#_advanced_search) on what kind of advanced search tricks you can use to make it simpler.\n", "\n", "For this, let's say that I remember that I put, somewhere `AEG19_23`. Let's make the query!" ] }, { "cell_type": "code", "execution_count": 5, "id": "4fd417d5-ca7d-42fe-9a1a-54137e686eb8", "metadata": {}, "outputs": [], "source": [ "query = 'AEG19_23'\n", "my_samples_free_text = api.search_samples(text=query)" ] }, { "cell_type": "code", "execution_count": 6, "id": "335d5d16-73d6-4475-a009-6cbbdf97d02a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "77\n", "{'characteristics': {'Effective(%)': [{'text': '99.16'}], 'Error(%)': [{'text': '0.01'}], 'GC(%)': [{'text': '63.0'}], 'Library_Flowcell_Lane': [{'text': 'MKDN240001563-1A_22GYCMLT3_L5'}], 'Q20(%)': [{'text': '98.18'}], 'Q30(%)': [{'text': '94.85'}], 'Raw data': [{'text': '6378478200'}], 'Raw reads': [{'text': '42523188'}], 'SRA accession': [{'text': 'ERS20120838'}], 'analysis date': [{'text': '2023-07-01T00:00:00Z'}], 'biome': [{'text': 'soil'}], 'biome.1': [{'text': 'soil'}], 'broad-scale environmental context': [{'text': 'temperate biome'}], 'center': [{'text': 'HMGU'}], 'checklist': [{'text': 'ERC000022'}], 'collection date': [{'text': '2023-05-01T00:00:00Z'}], 'cryoprotectant': [{'text': 'none'}], 'cultivation': [{'text': 'not provided'}], 'depth': [{'text': 'not provided'}], 'elevation': [{'text': 'not provided'}], 'environmental medium': [{'text': 'Bulk soil'}], 'freezing method': [{'text': 'not provided'}], 'geographic location (country and/or sea)': [{'text': 'Germany'}], 'geographic location (latitude)': [{'text': '48.4'}], 'geographic location (longitude)': [{'text': '9.45'}], 'geographic location (region and locality)': [{'text': 'Alb'}], 'local environmental context': [{'text': 'Grassland soil'}], 'organism': [{'text': 'soil metagenome'}], 'plot_id': [{'text': 'AEG19'}], 'plot_id_BExIS': [{'text': 'A35463'}], 'preservation temperature': [{'text': 'not provided'}], 'project name': [{'text': 'MICROBE'}], 'sample_id_byYear': [{'text': 'AEG19_23'}], 'soil_type': [{'text': 'Leptosol'}], 'time point': [{'text': 'T1'}], 'year': [{'text': '2023'}]}, 'name': 'metag_AEG19_2023', 'accession': 'SAMEA115657829', 'release': '2024-05-24T14:09:02Z', 'sraAccession': 'ERS20120838', 'webinSubmissionAccountId': 'Webin-67007', 'taxId': 410658, 'status': 'PRIVATE', 'update': '2024-05-24T13:16:38.909Z', 'submitted': '2024-05-24T13:16:38.909Z', 'submittedVia': 'JSON_API', 'create': '2024-05-24T13:16:38.909Z', '_links': {'self': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657829'}, 'curationDomain': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657829{?curationdomain}', 'templated': True}, 'curationLinks': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657829/curationlinks'}, 'curationLink': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657829/curationlinks/{hash}', 'templated': True}, 'structuredData': {'href': 'https://www.ebi.ac.uk/biosamples/structureddata/SAMEA115657829'}}}\n" ] } ], "source": [ "print(len(my_samples_free_text))\n", "print(my_samples_free_text[0].entity)" ] }, { "cell_type": "markdown", "id": "00b42a52-e879-4053-bec9-b32ab1af3a1f", "metadata": {}, "source": [ "Sincerely, I really do not like the free text search. It doesn't really work as intended, with complex searches resulting most of the time in nothing at all (Either that or I am really stoopid, but... yeah probably is the second one).\n", "\n", "In any case, I **always** recommend relying on attributes\n", "\n", "## Using an accession\n", "\n", "You can also retrieve the samples by using an accession; this is usually the easiest, and it's the function that **must be defined** for all the API entities." ] }, { "cell_type": "code", "execution_count": 7, "id": "5e5ab19a-a943-4f50-a25c-1d8bbc254bb6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n", "{'characteristics': {'Effective(%)': [{'text': '99.16'}], 'Error(%)': [{'text': '0.01'}], 'GC(%)': [{'text': '63.0'}], 'Library_Flowcell_Lane': [{'text': 'MKDN240001563-1A_22GYCMLT3_L5'}], 'Q20(%)': [{'text': '98.18'}], 'Q30(%)': [{'text': '94.85'}], 'Raw data': [{'text': '6378478200'}], 'Raw reads': [{'text': '42523188'}], 'SRA accession': [{'text': 'ERS20120838'}], 'analysis date': [{'text': '2023-07-01T00:00:00Z'}], 'biome': [{'text': 'soil'}], 'biome.1': [{'text': 'soil'}], 'broad-scale environmental context': [{'text': 'temperate biome'}], 'center': [{'text': 'HMGU'}], 'checklist': [{'text': 'ERC000022'}], 'collection date': [{'text': '2023-05-01T00:00:00Z'}], 'cryoprotectant': [{'text': 'none'}], 'cultivation': [{'text': 'not provided'}], 'depth': [{'text': 'not provided'}], 'elevation': [{'text': 'not provided'}], 'environmental medium': [{'text': 'Bulk soil'}], 'freezing method': [{'text': 'not provided'}], 'geographic location (country and/or sea)': [{'text': 'Germany'}], 'geographic location (latitude)': [{'text': '48.4'}], 'geographic location (longitude)': [{'text': '9.45'}], 'geographic location (region and locality)': [{'text': 'Alb'}], 'local environmental context': [{'text': 'Grassland soil'}], 'organism': [{'text': 'soil metagenome'}], 'plot_id': [{'text': 'AEG19'}], 'plot_id_BExIS': [{'text': 'A35463'}], 'preservation temperature': [{'text': 'not provided'}], 'project name': [{'text': 'MICROBE'}], 'sample_id_byYear': [{'text': 'AEG19_23'}], 'soil_type': [{'text': 'Leptosol'}], 'time point': [{'text': 'T1'}], 'year': [{'text': '2023'}]}, 'name': 'metag_AEG19_2023', 'accession': 'SAMEA115657829', 'release': '2024-05-24T14:09:02Z', 'sraAccession': 'ERS20120838', 'webinSubmissionAccountId': 'Webin-67007', 'taxId': 410658, 'status': 'PRIVATE', 'update': '2024-05-24T13:16:38.909Z', 'submitted': '2024-05-24T13:16:38.909Z', 'submittedVia': 'JSON_API', 'create': '2024-05-24T13:16:38.909Z', '_links': {'self': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657829'}, 'curationDomain': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657829{?curationdomain}', 'templated': True}, 'curationLinks': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657829/curationlinks'}, 'curationLink': {'href': 'https://www.ebi.ac.uk/biosamples/samples/SAMEA115657829/curationlinks/{hash}', 'templated': True}, 'structuredData': {'href': 'https://www.ebi.ac.uk/biosamples/structureddata/SAMEA115657829'}}}\n" ] } ], "source": [ "my_sample = api.retrieve(['SAMEA115657829'])\n", "print(len(my_sample))\n", "print(my_sample[0].entity)" ] }, { "cell_type": "markdown", "id": "6bff2e9a-2ad5-425c-8878-ce14d9457a86", "metadata": {}, "source": [ "Please note you can **provide multiple accessions** as elements of the array" ] }, { "cell_type": "code", "execution_count": null, "id": "a518c387-5284-44a6-bf6f-a3c2b9bfbca2", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.11" } }, "nbformat": 4, "nbformat_minor": 5 }