{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e339c625-6f80-41f8-8d32-7aff261064cc",
   "metadata": {},
   "source": [
    "# Load and save data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "afcd3095-9822-4936-a7ae-04daf54768c1",
   "metadata": {},
   "source": [
    "## Load and create a MicrobiomeData object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "66cd8a0e-bc7c-44f7-9bea-876ea21dc357",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MicrobiomeData object summary\n",
      "----------------------------------------\n",
      "Abundance table: None\n",
      "Taxonomy table: None\n",
      "Sequence table: None\n",
      "Tree: None\n",
      "Metadata table: None\n",
      "----------------------------------------\n"
     ]
    }
   ],
   "source": [
    "import qdiv\n",
    "obj = qdiv.MicrobiomeData.load() #This creates an empty instance of the class MicrobiomeData. Here, the instance is saved in the variable name obj\n",
    "obj.info() #This prints information about the content of the object. As we can see, below, it is empty."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed3d45e4-6a9d-4eb1-b37d-a69b0c7d9e2a",
   "metadata": {},
   "source": [
    "Let's add some data to the object. We will use the *Saheb-Alam2019_DADA2* example data. We specify the filename. We can also specify the folder in which the file is located using the *path* variable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "069de085-0a56-42e7-8c9c-e091f4e5c2b1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MicrobiomeData object summary\n",
      "----------------------------------------\n",
      "Abundance table: 672 features x 16 samples\n",
      "Total reads: 504186.0\n",
      "Min reads/sample: 9783.0\n",
      "Max reads/sample: 82777.0\n",
      "Taxonomy table: None\n",
      "Sequence table: None\n",
      "Tree: None\n",
      "Metadata table: None\n",
      "----------------------------------------\n"
     ]
    }
   ],
   "source": [
    "obj.add_tab(\"Saheb-Alam2019_tab_dada2.tsv\", path=\"../../../qdiv/example_data\")\n",
    "obj.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b20db437-8965-4f27-a9ba-bb275076d7d5",
   "metadata": {},
   "source": [
    "Now, when calling obj.info(), we can see that an abundance table with 672 features and 16 samples has been added to the object.\n",
    "We continue and add sequences, taxonomic information, a phylogenetic tree, and meta data as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "7081bed6-5d89-4840-8186-e8028b6765ce",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MicrobiomeData object summary\n",
      "----------------------------------------\n",
      "Abundance table: 672 features x 16 samples\n",
      "Total reads: 504186.0\n",
      "Min reads/sample: 9783.0\n",
      "Max reads/sample: 82777.0\n",
      "Taxonomy table: 672 features, levels: ['Taxon', 'Confidence']\n",
      "Sequence table: 672 features\n",
      "Tree: 1342 nodes\n",
      "Metadata table: 16 samples, columns: ['location', 'feed', 'mfc']\n",
      "Metadata preview:\n",
      "       location     feed mfc\n",
      "sample                      \n",
      "S4        anode  acetate   B\n",
      "S5        anode  acetate   B\n",
      "S6        anode  acetate   B\n",
      "----------------------------------------\n"
     ]
    }
   ],
   "source": [
    "obj.add_tax(\"Saheb-Alam2019_tax_dada2.tsv\", sep=\"\\t\", path=\"../../../qdiv/example_data\") #Note that I use the sep variable to specify the column separator used in the input file.\n",
    "obj.add_seq_from_fasta(\"Saheb-Alam2019_seq_dada2.fasta\", path=\"../../../qdiv/example_data\")\n",
    "obj.add_tree(\"Saheb-Alam2019_tree_dada2.nwk\", path=\"../../../qdiv/example_data\")\n",
    "obj.add_meta(\"Saheb-Alam2019_meta.csv\", sep=\",\", path=\"../../../qdiv/example_data\")\n",
    "obj.info(3) #Here is specify that I want to see 3 lines of the meta data in the printed output."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "380dd255-d2a3-4bb4-895f-801e7b524ec6",
   "metadata": {},
   "source": [
    "Now we can see that we have lot more data in the object. It would also have been possible to load all the data in one go with the **MicrobiomeData.load** command."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "4c619430-65f2-495a-8945-d34af0dab961",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MicrobiomeData object summary\n",
      "----------------------------------------\n",
      "Abundance table: 672 features x 16 samples\n",
      "Total reads: 504186.0\n",
      "Min reads/sample: 9783.0\n",
      "Max reads/sample: 82777.0\n",
      "Taxonomy table: 672 features, levels: ['Taxon', 'Confidence']\n",
      "Sequence table: 672 features\n",
      "Tree: 1342 nodes\n",
      "Metadata table: 16 samples, columns: ['location', 'feed', 'mfc']\n",
      "Metadata preview:\n",
      "       location     feed mfc\n",
      "sample                      \n",
      "S4        anode  acetate   B\n",
      "----------------------------------------\n"
     ]
    }
   ],
   "source": [
    "obj = qdiv.MicrobiomeData.load(tab=\"Saheb-Alam2019_tab_dada2.tsv\", tax=\"Saheb-Alam2019_tax_dada2.tsv\", fasta=\"Saheb-Alam2019_seq_dada2.fasta\", \\\n",
    "                               tree=\"Saheb-Alam2019_tree_dada2.nwk\", meta=\"Saheb-Alam2019_meta.csv\", path=\"../../../qdiv/example_data\", \\\n",
    "                              tab_sep=\"\\t\", tax_sep=\"\\t\", meta_sep=\",\") #Note that I can specify different separators for different input table files.\n",
    "obj.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c417e0e9-3d4e-4a5a-8b53-2e5844fde55f",
   "metadata": {},
   "source": [
    "All data is loaded into the object as pandas dataframes. We can have a look at the content of a specific data table like this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "8ced22ee-70ad-458f-9f1e-a28f43d88c3b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                                    S4     S5    S6    S7   S10   S11    S12  \\\n",
      "Feature                                                                        \n",
      "0a4f6af6827b404e874581539d79f24c   0.0    0.0   0.0   0.0   0.0   0.0    7.0   \n",
      "0a8520a771e224611a2eef9f67e4aac9   0.0    0.0   0.0   0.0   0.0   0.0    0.0   \n",
      "0a9495dc2a22f1eefcd9e4e4ebd2617f   0.0    0.0   0.0   0.0   0.0  27.0    0.0   \n",
      "0b6a7fb1e3b155264debe582497d69d2  47.0  111.0   0.0   0.0  94.0  60.0  323.0   \n",
      "0b085a4fec1a492a30b46aeeb05a66bc  42.0   86.0  39.0  18.0  10.0   0.0   51.0   \n",
      "\n",
      "                                    S13  S20  S21  S22  S23   S26   S27  S28  \\\n",
      "Feature                                                                        \n",
      "0a4f6af6827b404e874581539d79f24c    0.0  0.0  0.0  0.0  0.0   0.0   0.0  0.0   \n",
      "0a8520a771e224611a2eef9f67e4aac9    0.0  0.0  0.0  0.0  0.0  16.0  26.0  0.0   \n",
      "0a9495dc2a22f1eefcd9e4e4ebd2617f    0.0  0.0  0.0  0.0  0.0   0.0   0.0  0.0   \n",
      "0b6a7fb1e3b155264debe582497d69d2  102.0  0.0  0.0  0.0  0.0   0.0   0.0  0.0   \n",
      "0b085a4fec1a492a30b46aeeb05a66bc   21.0  0.0  0.0  0.0  0.0   0.0   0.0  0.0   \n",
      "\n",
      "                                   S29  \n",
      "Feature                                 \n",
      "0a4f6af6827b404e874581539d79f24c   0.0  \n",
      "0a8520a771e224611a2eef9f67e4aac9   0.0  \n",
      "0a9495dc2a22f1eefcd9e4e4ebd2617f   0.0  \n",
      "0b6a7fb1e3b155264debe582497d69d2  29.0  \n",
      "0b085a4fec1a492a30b46aeeb05a66bc   0.0  \n"
     ]
    }
   ],
   "source": [
    "print(obj.tab.head()) #This shows the first few lines of the abundance table"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a801b05-dbe0-4c19-9861-1981fefb5450",
   "metadata": {},
   "source": [
    "The Saheb-Alam2019 data was generated from PCR amplification of the V4 regions of 16S rRNA gene. The fastq file were processed in qiime2 using dada2.\n",
    "The resulting abundance table and amplicon sequence variants were exported using *qiime export* and converted to a .tsv file and a fasta file.\n",
    "As you can see, the features have weird names. We can rename them using the **rename_features** function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "27fd7fd2-406b-4883-9b22-1fe6cb2dbb3b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "              S4       S5       S6      S7     S10     S11      S12     S13  \\\n",
      "Feature                                                                       \n",
      "ASV1     11302.0  21046.0  11153.0  9637.0    45.0    50.0     97.0    84.0   \n",
      "ASV2         0.0      0.0      0.0     0.0     0.0     0.0     14.0     0.0   \n",
      "ASV3         0.0      0.0      0.0     0.0     0.0     0.0      0.0     0.0   \n",
      "ASV4        84.0    241.0     91.0     0.0  5375.0  8202.0  11419.0  3635.0   \n",
      "ASV5         0.0      0.0      0.0    44.0  1103.0   466.0   4092.0    97.0   \n",
      "\n",
      "             S20      S21     S22      S23     S26     S27     S28     S29  \n",
      "Feature                                                                     \n",
      "ASV1      1013.0   1693.0   522.0   2851.0     0.0     0.0     0.0   107.0  \n",
      "ASV2     21102.0  16385.0  4777.0  17839.0  1941.0  8230.0  3829.0  7037.0  \n",
      "ASV3     19489.0  13748.0  8570.0  25646.0     0.0   138.0     0.0    68.0  \n",
      "ASV4         0.0     36.0     0.0      0.0    73.0   177.0     0.0   134.0  \n",
      "ASV5        24.0      0.0     0.0     16.0   108.0  1034.0  1170.0  4598.0  \n"
     ]
    }
   ],
   "source": [
    "obj.rename_features(name_type=\"ASV\", inplace=True) #inplace specifies that the object is modified in place.\n",
    "print(obj.tab.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a06475d-9f2d-4133-a611-c3cda275c738",
   "metadata": {},
   "source": [
    "Now the names look better, and they are sorted starting with the ASV having the highest mean relative abundance in the samples (ASV1)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "207b3aa2-ad89-4468-ae1e-c9dfa33f8be6",
   "metadata": {},
   "source": [
    "## Save files\n",
    "Let's say we have worked on the files, perhaps renamed features, subsetted samples, or rarefied the read counts. We want to save the resulting file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "01e45e73-4392-4b13-8b50-7e9e27226e85",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['.\\\\Fixed_files_tab.csv',\n",
       " '.\\\\Fixed_files_tax.csv',\n",
       " '.\\\\Fixed_files_meta.csv',\n",
       " '.\\\\Fixed_files_seq.fa',\n",
       " '.\\\\Fixed_files_tree.nwk']"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj.save(savename=\"Fixed_files\", path=\".\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a76facd0-7262-468b-895c-b9574c1e6076",
   "metadata": {},
   "source": [
    "We can see a list of files saved. Note that the *savename* variable is used a prefix for all the output files. \n",
    "The path specifies the folder where the files are saved (\".\" is the current folder)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7b80b27-95d5-4217-913d-335379f135e5",
   "metadata": {},
   "source": [
    "## Load other types of files\n",
    "Relative abundance tables and taxonomy data can be generated in a variety of ways. \n",
    "There are some functions to load other types of data. For example, the following:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "297a99b9-34ed-4efc-b3a8-86f7ad4ccd6d",
   "metadata": {},
   "source": [
    "**CoverM**  \n",
    "Relative abundance in tab-separated files generated with CoverM can be loaded like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "4dff2eb7-1714-49bc-9745-565c6a17caaf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MicrobiomeData object summary\n",
      "----------------------------------------\n",
      "Abundance table: 304 features x 26 samples\n",
      "Total reads: 1761.7865439559\n",
      "Min reads/sample: 61.478759796599995\n",
      "Max reads/sample: 73.8195703024\n",
      "Taxonomy table: None\n",
      "Sequence table: None\n",
      "Tree: None\n",
      "Metadata table: 26 samples, columns: ['unmapped_reads_perc']\n",
      "Metadata preview:\n",
      "                                       unmapped_reads_perc\n",
      "all_mags_contigs.fa/v2114_R1_QF.fq.gz            34.778670\n",
      "all_mags_contigs.fa/v2115_R1_QF.fq.gz            38.521270\n",
      "all_mags_contigs.fa/v2116_R1_QF.fq.gz            29.976946\n",
      "----------------------------------------\n"
     ]
    }
   ],
   "source": [
    "import qdiv\n",
    "obj = qdiv.MicrobiomeData.load()\n",
    "obj.add_tab_from_coverm(\"Modin2025_CoverM.tsv\", path=\"../../../qdiv/example_data\")\n",
    "obj.info(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bdc74762-d834-4d5e-9cbd-8a8dcb8f9795",
   "metadata": {},
   "source": [
    "As you can see in the metadata preview, information about the unmapped reads has been placed in the metadata, and the sample names are quite ugly. \n",
    "We can recreate the object and extract the correct sample names using *first_sep* and *second_sep* variables. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "22da1e4f-b9e6-446f-b96b-4c6c16327c3a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MicrobiomeData object summary\n",
      "----------------------------------------\n",
      "Abundance table: 304 features x 26 samples\n",
      "Total reads: 1761.7865439559\n",
      "Min reads/sample: 61.478759796599995\n",
      "Max reads/sample: 73.8195703024\n",
      "Taxonomy table: None\n",
      "Sequence table: None\n",
      "Tree: None\n",
      "Metadata table: 26 samples, columns: ['unmapped_reads_perc']\n",
      "Metadata preview:\n",
      "       unmapped_reads_perc\n",
      "v2114            34.778670\n",
      "v2115            38.521270\n",
      "v2116            29.976946\n",
      "----------------------------------------\n"
     ]
    }
   ],
   "source": [
    "import qdiv\n",
    "obj = qdiv.MicrobiomeData.load()\n",
    "obj.add_tab_from_coverm(\"Modin2025_CoverM.tsv\", path=\"../../../qdiv/example_data\", first_sep=\"fa/\", second_sep=\"_R1\")\n",
    "obj.info(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28782b50-2a00-4c2b-923a-f6c98ea9d0b9",
   "metadata": {},
   "source": [
    "**GTDB-Tk**  \n",
    "We also want to add taxonomic information, which in this case was generated with GTDB-Tk. \n",
    "We have two input files one for archaea and one for bacteria, they can be added like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "1a51f0c9-c909-440c-9b96-ba74980a5c65",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MicrobiomeData object summary\n",
      "----------------------------------------\n",
      "Abundance table: 304 features x 26 samples\n",
      "Total reads: 1761.7865439559\n",
      "Min reads/sample: 61.478759796599995\n",
      "Max reads/sample: 73.8195703024\n",
      "Taxonomy table: 304 features, levels: ['Domain', 'Phylum', 'Class', 'Order', 'Family', 'Genus', 'Species']\n",
      "Sequence table: None\n",
      "Tree: None\n",
      "Metadata table: 26 samples, columns: ['unmapped_reads_perc']\n",
      "Metadata preview:\n",
      "       unmapped_reads_perc\n",
      "v2114             34.77867\n",
      "----------------------------------------\n"
     ]
    }
   ],
   "source": [
    "obj.add_tax_from_gtdbtk([\"Modin2025_gtdbtk.ar53.summary.tsv\", \"Modin2025_gtdbtk.bac120.summary.tsv\"], path=\"../../../qdiv/example_data\")\n",
    "obj.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18cb0c74-f4a2-4a52-8587-853f1319dd9b",
   "metadata": {},
   "source": [
    "We can have look at the taxonomy table:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "42bb069f-af49-4ad6-8451-601ee91f7e20",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "           Domain            Phylum           Class                Order  \\\n",
      "Feature                                                                    \n",
      "KM1      Bacteria   Planctomycetota  Planctomycetia         Pirellulales   \n",
      "Km2      Bacteria  Actinobacteriota   Actinomycetia  Propionibacteriales   \n",
      "Km3      Bacteria  Desulfobacterota      Syntrophia         Syntrophales   \n",
      "KM4      Bacteria     Chloroflexota    Anaerolineae              UBA1429   \n",
      "Km5      Bacteria  Actinobacteriota   Actinomycetia      Actinomycetales   \n",
      "\n",
      "                       Family             Genus                       Species  \n",
      "Feature                                                                        \n",
      "KM1           Thermoguttaceae       Thermogutta                                \n",
      "Km2      Propionibacteriaceae  Brevilactibacter                                \n",
      "Km3             Smithellaceae           UBA4810           UBA4810 sp002418825  \n",
      "KM4                   UBA1429           UBA6128                                \n",
      "Km5        Bifidobacteriaceae   Bifidobacterium  Bifidobacterium tibiigranuli  \n"
     ]
    }
   ],
   "source": [
    "print(obj.tax.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18b34e6e-632d-409f-9a81-d74692d38c9c",
   "metadata": {},
   "source": [
    "**tax_prefix**  \n",
    "We can add prefix to the taxonomic names, to indicate the level. That can be done with the **tax_prefix** function\n",
    "(which also can remove the prefix if the *add* variable is False)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "bc6ac620-1c48-498e-a82b-2fe00b48fc27",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "              Domain               Phylum              Class  \\\n",
      "Feature                                                        \n",
      "KM1      d__Bacteria   p__Planctomycetota  c__Planctomycetia   \n",
      "Km2      d__Bacteria  p__Actinobacteriota   c__Actinomycetia   \n",
      "Km3      d__Bacteria  p__Desulfobacterota      c__Syntrophia   \n",
      "KM4      d__Bacteria     p__Chloroflexota    c__Anaerolineae   \n",
      "Km5      d__Bacteria  p__Actinobacteriota   c__Actinomycetia   \n",
      "\n",
      "                          Order                   Family                Genus  \\\n",
      "Feature                                                                         \n",
      "KM1             o__Pirellulales       f__Thermoguttaceae       g__Thermogutta   \n",
      "Km2      o__Propionibacteriales  f__Propionibacteriaceae  g__Brevilactibacter   \n",
      "Km3             o__Syntrophales         f__Smithellaceae           g__UBA4810   \n",
      "KM4                  o__UBA1429               f__UBA1429           g__UBA6128   \n",
      "Km5          o__Actinomycetales    f__Bifidobacteriaceae   g__Bifidobacterium   \n",
      "\n",
      "                                 Species  \n",
      "Feature                                   \n",
      "KM1                                       \n",
      "Km2                                       \n",
      "Km3               s__UBA4810 sp002418825  \n",
      "KM4                                       \n",
      "Km5      s__Bifidobacterium tibiigranuli  \n"
     ]
    }
   ],
   "source": [
    "obj.tax_prefix(add=True, inplace=True)\n",
    "print(obj.tax.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86b2cfb2-1779-464d-b202-f16118298017",
   "metadata": {},
   "source": [
    "**SingleM**  \n",
    "SingleM can be used to generate OTU tables from metagenomic reads. SingleM also a *convertToEBD.py* function to convert the OTU table to a tab-separated file with taxonomic information and read counts. That ebd file can be loaded using the **add_ebd_tab_from_singlem** function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "4e712a50-56db-4b65-b338-df88696a70da",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MicrobiomeData object summary\n",
      "----------------------------------------\n",
      "Abundance table: 1137 features x 7 samples\n",
      "Total reads: 5640.0\n",
      "Min reads/sample: 630.0\n",
      "Max reads/sample: 960.0\n",
      "Taxonomy table: 1137 features, levels: ['Domain', 'Phylum', 'Class', 'Order', 'Family', 'Genus', 'Species']\n",
      "Sequence table: None\n",
      "Tree: None\n",
      "Metadata table: None\n",
      "----------------------------------------\n"
     ]
    }
   ],
   "source": [
    "import qdiv\n",
    "obj = qdiv.MicrobiomeData.load()\n",
    "obj.add_ebd_tab_from_singlem(\"Modin2025_SingleM.ebd\", path=\"../../../qdiv/example_data\")\n",
    "obj.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b10fe377-64b1-4948-9dbf-aea299571afd",
   "metadata": {},
   "source": [
    "Let's have a closer look at the count table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "179f5206-8f14-47ac-9135-1d1a922214ee",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "         v2126_1  v2136_1  v2145_1  v2151_1  v2204_1  v2117_1  v2114_1\n",
      "Feature                                                               \n",
      "OTU1         0.0      0.0      0.0      0.0      0.0      0.0      1.0\n",
      "OTU2         0.0      0.0      0.0      0.0      0.0      1.0      1.0\n",
      "OTU3         3.0      0.0      0.0      0.0      0.0      1.0      1.0\n",
      "OTU4         0.0      0.0      0.0      0.0      0.0      2.0      2.0\n",
      "OTU5         0.0      0.0      0.0      0.0      0.0      0.0      2.0\n"
     ]
    }
   ],
   "source": [
    "print(obj.tab.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93237de5-0b6c-4961-ac4e-e25b5eb6a15f",
   "metadata": {},
   "source": [
    "We can see that the sample names all contain a _1. To remove that part of the sample names, we can specify *second_sep*=\"_\" when we load the file. Then, everyhting after the \"_\" in the names will be dropped."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "2600dc0a-2ccd-4434-9981-5443d99615ee",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "         v2126  v2136  v2145  v2151  v2204  v2117  v2114\n",
      "Feature                                                 \n",
      "OTU1       0.0    0.0    0.0    0.0    0.0    0.0    1.0\n",
      "OTU2       0.0    0.0    0.0    0.0    0.0    1.0    1.0\n",
      "OTU3       3.0    0.0    0.0    0.0    0.0    1.0    1.0\n",
      "OTU4       0.0    0.0    0.0    0.0    0.0    2.0    2.0\n",
      "OTU5       0.0    0.0    0.0    0.0    0.0    0.0    2.0\n"
     ]
    }
   ],
   "source": [
    "obj = qdiv.MicrobiomeData.load()\n",
    "obj.add_ebd_tab_from_singlem(\"Modin2025_SingleM.ebd\", path=\"../../../qdiv/example_data\", second_sep=\"_\")\n",
    "print(obj.tab.head())"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}