{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e78a8bd9-90d9-49a8-bee0-c31fe14abc31",
   "metadata": {},
   "source": [
    "# Modifying the source input with a key:value map"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2e829eb-3ea2-4554-aec5-98ff55ca3309",
   "metadata": {},
   "source": [
    "In this notebook, we will be looking at one of the properties that the input processors have: They can transform, with a simple map, the names of the fields for the loaded metadata.\n",
    "\n",
    "Why? Why not just have the input file match the expected values?\n",
    "\n",
    "Well, sometimes you **will** have to do that. But let's imagine that your pipeline is producing an almost-ready input, but your laboratory, instead of calling their samples by `name`, uses another identifier, such as `sample_id`. You want to automatise sending the metadata when is ready by the pipeline, but you don't want to write another script. Easy then! Let's see how to do that:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "0031d648-860b-4863-b733-c55f338fa165",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'sample_id': 'sumple', 'collected_at': 'noon'}]\n"
     ]
    }
   ],
   "source": [
    "## Import everything we need\n",
    "from biobroker.input_processor import TsvInputProcessor # An input processor\n",
    "\n",
    "sample_tsv = [\n",
    "    [\"sample_id\", \"collected_at\"],\n",
    "    [\"sumple\", \"noon\"]         \n",
    "]\n",
    "\n",
    "writable_sample = \"\\n\".join([\"\\t\".join(row) for row in sample_tsv])\n",
    "with open(\"simple_sample_sumple.tsv\", \"w\") as f:\n",
    "    f.write(writable_sample)\n",
    "\n",
    "path = \"simple_sample_sumple.tsv\" # This is the file we created previously\n",
    "\n",
    "## Set up the required entities\n",
    "\n",
    "input_processor = TsvInputProcessor(input_data=path)\n",
    "\n",
    "print(input_processor.input_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f3bf4b2-8a0a-4151-ab81-1af153cf84d7",
   "metadata": {},
   "source": [
    "Up to here, everything is the same: you have set up the input processor pointing to the data.\n",
    "\n",
    "Here comes the slightly different part: Let's transform the metadata so that \"sample_id\" becomes \"name\":"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "bd6ceb29-d574-4db5-8256-e8c71e766504",
   "metadata": {},
   "outputs": [],
   "source": [
    "map_of_fields = {\n",
    "    \"sample_id\": \"name\"\n",
    "}\n",
    "\n",
    "input_processor.transform(field_mapping=map_of_fields, delete_non_mapped_fields=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "381e6523-3126-4cb5-8eb0-d6f20e4cec4a",
   "metadata": {},
   "source": [
    "Let's take a look at the metadata now!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "9445d66f-2a25-4fd8-8496-a851d593cd5b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'collected_at': 'noon', 'name': 'sumple'}]\n"
     ]
    }
   ],
   "source": [
    "print(input_processor.input_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ff07d34-b3d9-42cc-9510-500313cb8f98",
   "metadata": {},
   "source": [
    "ta-da! We now have the samples in the format that we want and we can `process` and `submit` them without any issue.\n",
    "\n",
    "While not a super complicated transformation, this can help setting up your own pipelines without the need to tailor the metadata in your pipeline's output."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}