{ "cells": [ { "cell_type": "markdown", "id": "c33aca62-4b83-487f-8bfe-ed103132d103", "metadata": {}, "source": [ "# Diversity \n", "Diversity in ecological studies refers to different aspects of variation within and across communities: \n", "- **Alpha diversity:** The richness and evenness of taxa (or \"features\") within a single sample or site. It measures how many different taxa are present and how evenly they are distributed.\n", "- **Beta diversity:** The degree of difference in community composition between two or more samples. It captures how similar or dissimilar communities are across environments.\n", "- **Gamma diversity:** The overall diversity across a collection of samples or an entire region, considering both the number of taxa and their distribution at a broader spatial scale.\n", " \n", "**Naive-, phylogenetic-, or functional diversity** \n", "One way to calculate diversity is to consider all taxa as separate entities and not care about their relationships. We call these diversity values *naive*. \n", "A second way to calculate diversity takes the relationships in a phylogenetic tree into account. Two taxa that share many branches in the tree contribute less to diversity than two that are very far from each other in the tree. We call the diversity values that are calculated with a tree as input *phylogenetic*. \n", "A third way to calculate diversity takes the pairwise distances between taxa into account. Two taxa that have a small pairwise distance contribute less to diversity than two that have a high pairwise distance. We call the diversity values that are calculated with a distance matrix as input *functional*.\n", "\n", "Naive alpha diversity is simply the effective number of taxa, phylogenetic alpha diversity is the effective branch diversity in the phylogenetic tree associated with the taxa in the sample, and functional alpha diversity is the effective total distance between taxa in the sample. The word \"effective\" used above refers to the relative abundance weighting by the diversity order, q.\n", "\n", "**Hill numbers (or effective numbers)** \n", "Imagine a community with 10 species, all equally abundant. The most intuitive measure of diversity here is 10, and Hill numbers give exactly that result. Now consider a community with 2 dominant species and 8 rare ones. What should the diversity be? \n", "- If we ignore relative abundances, the answer is 10.\n", "- If we ignore rare species, the answer is 2.\n", "\n", "Hill numbers solve this by introducing a single parameter, the diversity order (q), which controls how much weight is given to species’ relative abundances: \n", "- When q=0, all species count equally (richness).\n", "- When q=1, species are weighted by their proportional abundance (Shannon diversity).\n", "- When q=2, common species dominate the measure (Simpson diversity).\n", "\n", "This flexibility makes Hill numbers a powerful and intuitive framework for comparing communities. And it doesn’t stop at 0, 1, or 2 — diversity can be explored across a continuum of q values, offering a nuanced view of community structure." ] }, { "cell_type": "markdown", "id": "fbe46b08-f6ec-4a30-a2d1-9b18c24da1b0", "metadata": {}, "source": [ "## Alpha diversity\n", "Let's calculate alpha diversity using qdiv. First, we will load and rarefy some example data." ] }, { "cell_type": "code", "execution_count": 1, "id": "5b3b973a-97e3-4d1b-b010-f0ca3ea98ff5", "metadata": {}, "outputs": [], "source": [ "import qdiv\n", "obj = qdiv.MicrobiomeData.load_example(\"Saheb-Alam_DADA2\")\n", "rarefied_obj = obj.rarefy() #I rarefy the object before calculating alpha diversity" ] }, { "cell_type": "markdown", "id": "63fe3ec8-0057-4764-987f-e2a5ed76f800", "metadata": {}, "source": [ "Next, we will calculate the naive alpha diversity values for the samples using the **diversity.naive_alpha** function." ] }, { "cell_type": "code", "execution_count": 2, "id": "d818b5ee-68bf-42bc-b35b-6a6ecaf3b71a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "S4 4.917773\n", "S5 8.926633\n", "S6 5.934636\n", "S7 4.894088\n", "S10 47.964757\n", "S11 39.977780\n", "S12 91.900986\n", "S13 55.938017\n", "S20 7.297213\n", "S21 6.967200\n", "S22 4.626569\n", "S23 6.013025\n", "S26 47.436514\n", "S27 47.834682\n", "S28 39.662705\n", "S29 59.220636\n", "dtype: float64\n" ] } ], "source": [ "alpha = qdiv.diversity.naive_alpha(rarefied_obj, q=1)\n", "print(alpha)" ] }, { "cell_type": "markdown", "id": "b5953500-04cd-4a97-bfaa-4f83d5d94563", "metadata": {}, "source": [ "This was the alpha diversity for diversity order q=1. Perhaps we want to calculate it for multiple q values and save it into a csv file together with the meta data. " ] }, { "cell_type": "code", "execution_count": 3, "id": "53d2872f-b480-42ac-9f9c-27c7c826a603", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " location feed mfc Div_q0 Div_q1 Div_q2\n", "S4 anode acetate B 75.0 4.917773 1.862620\n", "S5 anode acetate B 138.0 8.926633 2.448316\n", "S6 anode acetate B 78.0 5.934636 2.080424\n", "S7 anode acetate B 74.0 4.894088 1.845467\n", "S10 cathode acetate B 165.0 47.964757 13.685347\n", "S11 cathode acetate B 194.0 39.977780 10.033863\n", "S12 cathode acetate B 315.0 91.900986 32.390060\n", "S13 cathode acetate B 186.0 55.938017 21.741220\n", "S20 anode glucose D 150.0 7.297213 3.327548\n", "S21 anode glucose D 107.0 6.967200 3.379743\n", "S22 anode glucose D 57.0 4.626569 2.650418\n", "S23 anode glucose D 116.0 6.013025 3.044453\n", "S26 cathode glucose D 165.0 47.436514 16.851130\n", "S27 cathode glucose D 222.0 47.834682 14.928833\n", "S28 cathode glucose D 145.0 39.662705 14.361918\n", "S29 cathode glucose D 241.0 59.220636 20.035359\n" ] } ], "source": [ "meta = rarefied_obj.meta.copy() #We make a copy of the metadata to avoid modifying the object\n", "for q in [0, 1, 2]: #A for loop to calculate alpha diversity for several values of q\n", " alpha = qdiv.diversity.naive_alpha(rarefied_obj, q=q)\n", " meta[\"Div_q\"+str(q)] = alpha #Add the calculated values to the meta dataframe\n", "meta.to_csv(\"Metadata_with_alpha_diversity.csv\") #This lines saves the dataframe as a CSV file.\n", "print(meta) #This prints the file content to show it below" ] }, { "cell_type": "markdown", "id": "8d4f1d5b-cb02-40ce-a411-563086267bb6", "metadata": {}, "source": [ "The results seem to indicate higher diversity on cathode than anodes, and higher diversity in glucose-fed than acetate-fed microbial fuel cells.\n", "Higher diversity orders (q values) result in lower diversity. The change in diversity as q increases is an indication of the evenness. We can calculate evenness parameters using the function below." ] }, { "cell_type": "code", "execution_count": 4, "id": "8ee0be08-6b12-474d-a610-7f41e6d0200a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " location feed mfc Div_q0 Div_q1 Div_q2 Evenness\n", "S4 anode acetate B 75.0 4.917773 1.862620 0.368931\n", "S5 anode acetate B 138.0 8.926633 2.448316 0.444272\n", "S6 anode acetate B 78.0 5.934636 2.080424 0.408750\n", "S7 anode acetate B 74.0 4.894088 1.845467 0.368960\n", "S10 cathode acetate B 165.0 47.964757 13.685347 0.758031\n", "S11 cathode acetate B 194.0 39.977780 10.033863 0.700156\n", "S12 cathode acetate B 315.0 91.900986 32.390060 0.785859\n", "S13 cathode acetate B 186.0 55.938017 21.741220 0.770080\n", "S20 anode glucose D 150.0 7.297213 3.327548 0.396655\n", "S21 anode glucose D 107.0 6.967200 3.379743 0.415426\n", "S22 anode glucose D 57.0 4.626569 2.650418 0.378876\n", "S23 anode glucose D 116.0 6.013025 3.044453 0.377384\n", "S26 cathode glucose D 165.0 47.436514 16.851130 0.755862\n", "S27 cathode glucose D 222.0 47.834682 14.928833 0.715895\n", "S28 cathode glucose D 145.0 39.662705 14.361918 0.739523\n", "S29 cathode glucose D 241.0 59.220636 20.035359 0.744106\n" ] } ], "source": [ "evenness = qdiv.diversity.evenness(rarefied_obj, index=\"pielou\")\n", "meta[\"Evenness\"] = evenness\n", "print(meta)" ] }, { "cell_type": "markdown", "id": "9f3282d9-286c-4d1f-8cbf-02e6c20afe86", "metadata": {}, "source": [ "The cathode samples have, indeed, higher evenness than the anode samples." ] }, { "cell_type": "markdown", "id": "a5ea8255-9ac6-4fee-a3d9-5ae0d52873ae", "metadata": {}, "source": [ "**Phylogenetic diversity** \n", "Since a phylogenetic tree is included in the example data object, we can also calculate phylogenetic diversity using the **diversity.phyl_alpha** function." ] }, { "cell_type": "code", "execution_count": 5, "id": "2ff9f00c-b0ec-462d-b2b9-c4a8082cfdc2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " location feed mfc Div_q0 Div_q1 Div_q2 Evenness PD_q0 \\\n", "S4 anode acetate B 75.0 4.917773 1.862620 0.368931 8.758063 \n", "S5 anode acetate B 138.0 8.926633 2.448316 0.444272 13.629279 \n", "S6 anode acetate B 78.0 5.934636 2.080424 0.408750 10.161938 \n", "S7 anode acetate B 74.0 4.894088 1.845467 0.368960 9.647654 \n", "S10 cathode acetate B 165.0 47.964757 13.685347 0.758031 16.782338 \n", "S11 cathode acetate B 194.0 39.977780 10.033863 0.700156 19.017742 \n", "S12 cathode acetate B 315.0 91.900986 32.390060 0.785859 26.608242 \n", "S13 cathode acetate B 186.0 55.938017 21.741220 0.770080 18.946945 \n", "S20 anode glucose D 150.0 7.297213 3.327548 0.396655 15.332092 \n", "S21 anode glucose D 107.0 6.967200 3.379743 0.415426 11.947341 \n", "S22 anode glucose D 57.0 4.626569 2.650418 0.378876 8.704764 \n", "S23 anode glucose D 116.0 6.013025 3.044453 0.377384 12.915055 \n", "S26 cathode glucose D 165.0 47.436514 16.851130 0.755862 19.834035 \n", "S27 cathode glucose D 222.0 47.834682 14.928833 0.715895 23.010236 \n", "S28 cathode glucose D 145.0 39.662705 14.361918 0.739523 16.086969 \n", "S29 cathode glucose D 241.0 59.220636 20.035359 0.744106 22.700699 \n", "\n", " PD_q1 PD_q2 \n", "S4 0.982211 0.830020 \n", "S5 1.186406 0.869423 \n", "S6 1.053486 0.848903 \n", "S7 1.011143 0.837457 \n", "S10 1.745365 0.899019 \n", "S11 1.611911 0.878486 \n", "S12 2.014242 0.942826 \n", "S13 1.814265 0.930605 \n", "S20 1.010650 0.964344 \n", "S21 0.972128 0.961420 \n", "S22 0.892799 0.917759 \n", "S23 0.912382 0.933111 \n", "S26 2.227134 1.020376 \n", "S27 2.013565 1.011542 \n", "S28 1.835422 0.996555 \n", "S29 1.760901 0.985247 \n" ] } ], "source": [ "for q in [0, 1, 2]: #A for loop to calculate alpha diversity for several values of q\n", " alpha = qdiv.diversity.phyl_alpha(rarefied_obj, q=q)\n", " meta[\"PD_q\"+str(q)] = alpha #Add the calculated values to the meta dataframe\n", "print(meta) #Show the results" ] }, { "cell_type": "markdown", "id": "ec56b35f-730a-489f-92bf-7281a3dc6f44", "metadata": {}, "source": [ "**Functional diversity** \n", "To calculate functional diversity, we need a distance matrix with pairwise functional distances between features in the dataset. The distance matrix should be in the form of a pandas dataframe with feature names as row- and column indices. For the purpose of demonstrating the use of the function, we will calculate such as distance matrix based on the phylogenetic distances between features. We use the **sequences.tree_distance_matrix** function to get a distance matrix saved in the variable *distmat*." ] }, { "cell_type": "code", "execution_count": 4, "id": "ada41805-d4f5-494d-93d5-f15ca9a9c78f", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Leaves: 100%|██████████████████████████████████████████████████████████████████████| 671/671 [00:00<00:00, 1503.42it/s]\n" ] } ], "source": [ "distmat = qdiv.sequences.tree_distance_matrix(rarefied_obj, save=False) #By setting save=False, no file is saved" ] }, { "cell_type": "markdown", "id": "8248a134-6b05-496b-b60c-ef0123553c1b", "metadata": {}, "source": [ "Next, we call the **diversity.func_alpha** function to calculate functional alpha diversity." ] }, { "cell_type": "code", "execution_count": 5, "id": "72b92d27-d90c-4160-9e01-94f8e75a69cd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "S4 109.939446\n", "S5 267.304695\n", "S6 130.601954\n", "S7 117.680498\n", "S10 1629.952535\n", "S11 1477.581434\n", "S12 5009.610236\n", "S13 2030.422759\n", "S20 56.720516\n", "S21 45.512687\n", "S22 19.199088\n", "S23 29.713493\n", "S26 1471.048419\n", "S27 1419.489605\n", "S28 912.282687\n", "S29 2104.082254\n", "dtype: float64\n" ] } ], "source": [ "alpha = qdiv.diversity.func_alpha(rarefied_obj, distmat, q=1)\n", "print(alpha)" ] }, { "cell_type": "markdown", "id": "35b120f1-71f7-4bf1-bd9d-af1c48b98a03", "metadata": {}, "source": [ "## Beta diversity\n", "**Relationship between alpha, beta, and gamma diversity** \n", "One common way to define beta diversity is as a ratio between gamma and alpha: $D_\\beta=\\frac{D_\\gamma}{D_\\alpha}$ \n", "Gamma is the diversity of the pooled samples and alpha is the mean diversity of the individual samples. \n", "Beta diversity takes a value between 1 and N (the number of samples being compared). For functional beta diversity it goes between 1 and NxN.\n", "\n", "Beta diversity can be converted to a dissimilarity index, which takes a value between 0 (if the compared samples are identical) and 1 (if the compared samples are completely different). Dissimilarity can be calculated with a local or regional viewpoint. The local viewpoint quantifies the effective proportion of species in a sample that is not shared across all samples. The regional viewpoint quantifies the effective proportion of species in the pooled samples that is not shared across all samples.\n", "Just like alpha diversity, beta diversity can be calculated for any diveristy order, q, which means we can decide the emphasis we want to put on the relative abundance of the species.\n", "\n", "Just like the alpha diversity indices, beta diversity and dissimilarity can be calculated using **naive-, phylogenetic-, and functional** indices. \n", "Let's calculate beta diversity for the example data. First, we'll calculate pairwise dissimilarities between all samples in our data using the **diversity.naive_beta** function." ] }, { "cell_type": "code", "execution_count": 6, "id": "f67141f7-aac4-45f5-bd8e-49bc71b52c08", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " S4 S5 S6 S7 S10\n", "S4 0.000000 0.058442 0.049793 0.075557 0.839512\n", "S5 0.058442 0.000000 0.057203 0.073835 0.783948\n", "S6 0.049793 0.057203 0.000000 0.067338 0.821311\n", "S7 0.075557 0.073835 0.067338 0.000000 0.853798\n", "S10 0.839512 0.783948 0.821311 0.853798 0.000000\n" ] } ], "source": [ "import qdiv\n", "obj = qdiv.MicrobiomeData.load_example(\"Saheb-Alam_DADA2\")\n", "rarefied_obj = obj.rarefy()\n", "\n", "dissimilarity = qdiv.diversity.naive_beta(rarefied_obj, q=1)\n", "print(dissimilarity.iloc[:5, :5]) #Here we print the results for five samples" ] }, { "cell_type": "markdown", "id": "a5016faf-56a3-42ca-ae12-de536dde3a95", "metadata": {}, "source": [ "The dissimilarity matrix shows the pairwise dissimilarity between the samples. It can, e.g., be used later for plotting an ordination. \n", "We can also check the beta diversity across sample groups with multiple samples. Let's group the samples based on feed (one recieved acetate and one glucose). We'll use the **diversity.naive_multi_beta** function." ] }, { "cell_type": "code", "execution_count": 7, "id": "3c2f8574-47ac-4ec2-93c6-2db6ea4e8734", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " N beta local_dis regional_dis\n", "acetate 8.0 1.967056 0.325346 0.325346\n", "glucose 8.0 1.666429 0.245587 0.245587\n" ] } ], "source": [ "multi_beta = qdiv.diversity.naive_multi_beta(rarefied_obj, by=\"feed\", q=1)\n", "print(multi_beta)" ] }, { "cell_type": "markdown", "id": "197ce02f-86d5-4125-83eb-a7ced330ca99", "metadata": {}, "source": [ "Each sample group contains 8 samples (N=8). The beta diversity value indicates the effective number of distinct communities represented by the data. In other words, it reflects how much differentiation exists among the samples within a group.\n", "To provide additional perspective, beta diversity is also expressed as dissimilarity indices, calculated using either: \n", "- local view, which considers differences relative to individual samples, or\n", "- regional view, which considers differences relative to the entire pooled set of samples. \n", "\n", "The analysis can also be done with **diversity.phyl_multi_beta** or **diversity.func_multi_beta**." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.14" } }, "nbformat": 4, "nbformat_minor": 5 }