Diversity

Diversity in ecological studies refers to different aspects of variation within and across communities:

Alpha diversity: The richness and evenness of taxa (or “features”) within a single sample or site. It measures how many different taxa are present and how evenly they are distributed.
Beta diversity: The degree of difference in community composition between two or more samples. It captures how similar or dissimilar communities are across environments.
Gamma diversity: The overall diversity across a collection of samples or an entire region, considering both the number of taxa and their distribution at a broader spatial scale.

Naive-, phylogenetic-, or functional diversity
One way to calculate diversity is to consider all taxa as separate entities and not care about their relationships. We call these diversity values naive.
A second way to calculate diversity takes the relationships in a phylogenetic tree into account. Two taxa that share many branches in the tree contribute less to diversity than two that are very far from each other in the tree. We call the diversity values that are calculated with a tree as input phylogenetic.
A third way to calculate diversity takes the pairwise distances between taxa into account. Two taxa that have a small pairwise distance contribute less to diversity than two that have a high pairwise distance. We call the diversity values that are calculated with a distance matrix as input functional.

Naive alpha diversity is simply the effective number of taxa, phylogenetic alpha diversity is the effective branch diversity in the phylogenetic tree associated with the taxa in the sample, and functional alpha diversity is the effective total distance between taxa in the sample. The word “effective” used above refers to the relative abundance weighting by the diversity order, q.

Hill numbers (or effective numbers)

Imagine a community with 10 species, all equally abundant. The most intuitive measure of diversity here is 10, and Hill numbers give exactly that result. Now consider a community with 2 dominant species and 8 rare ones. What should the diversity be?

If we ignore relative abundances, the answer is 10.
If we ignore rare species, the answer is 2.

Hill numbers solve this by introducing a single parameter, the diversity order (q), which controls how much weight is given to species’ relative abundances:

When q=0, all species count equally (richness).
When q=1, species are weighted by their proportional abundance (Shannon diversity).
When q=2, common species dominate the measure (Simpson diversity).

This flexibility makes Hill numbers a powerful and intuitive framework for comparing communities. And it doesn’t stop at 0, 1, or 2 — diversity can be explored across a continuum of q values, offering a nuanced view of community structure.

Alpha diversity

Let’s calculate alpha diversity using qdiv. First, we will load and rarefy some example data.

[1]:

import qdiv
obj = qdiv.MicrobiomeData.load_example("Saheb-Alam_DADA2")
rarefied_obj = obj.rarefy() #I rarefy the object before calculating alpha diversity

Next, we will calculate the naive alpha diversity values for the samples using the diversity.naive_alpha function.

[2]:

alpha = qdiv.diversity.naive_alpha(rarefied_obj, q=1)
print(alpha)

S4      4.917773
S5      8.926633
S6      5.934636
S7      4.894088
S10    47.964757
S11    39.977780
S12    91.900986
S13    55.938017
S20     7.297213
S21     6.967200
S22     4.626569
S23     6.013025
S26    47.436514
S27    47.834682
S28    39.662705
S29    59.220636
dtype: float64

This was the alpha diversity for diversity order q=1. Perhaps we want to calculate it for multiple q values and save it into a csv file together with the meta data.

[3]:

meta = rarefied_obj.meta.copy() #We make a copy of the metadata to avoid modifying the object
for q in [0, 1, 2]: #A for loop to calculate alpha diversity for several values of q
    alpha = qdiv.diversity.naive_alpha(rarefied_obj, q=q)
    meta["Div_q"+str(q)] = alpha #Add the calculated values to the meta dataframe
meta.to_csv("Metadata_with_alpha_diversity.csv") #This lines saves the dataframe as a CSV file.
print(meta) #This prints the file content to show it below

    location     feed mfc  Div_q0     Div_q1     Div_q2
S4     anode  acetate   B    75.0   4.917773   1.862620
S5     anode  acetate   B   138.0   8.926633   2.448316
S6     anode  acetate   B    78.0   5.934636   2.080424
S7     anode  acetate   B    74.0   4.894088   1.845467
S10  cathode  acetate   B   165.0  47.964757  13.685347
S11  cathode  acetate   B   194.0  39.977780  10.033863
S12  cathode  acetate   B   315.0  91.900986  32.390060
S13  cathode  acetate   B   186.0  55.938017  21.741220
S20    anode  glucose   D   150.0   7.297213   3.327548
S21    anode  glucose   D   107.0   6.967200   3.379743
S22    anode  glucose   D    57.0   4.626569   2.650418
S23    anode  glucose   D   116.0   6.013025   3.044453
S26  cathode  glucose   D   165.0  47.436514  16.851130
S27  cathode  glucose   D   222.0  47.834682  14.928833
S28  cathode  glucose   D   145.0  39.662705  14.361918
S29  cathode  glucose   D   241.0  59.220636  20.035359

The results seem to indicate higher diversity on cathode than anodes, and higher diversity in glucose-fed than acetate-fed microbial fuel cells. Higher diversity orders (q values) result in lower diversity. The change in diversity as q increases is an indication of the evenness. We can calculate evenness parameters using the function below.

[4]:

evenness = qdiv.diversity.evenness(rarefied_obj, index="pielou")
meta["Evenness"] = evenness
print(meta)

    location     feed mfc  Div_q0     Div_q1     Div_q2  Evenness
S4     anode  acetate   B    75.0   4.917773   1.862620  0.368931
S5     anode  acetate   B   138.0   8.926633   2.448316  0.444272
S6     anode  acetate   B    78.0   5.934636   2.080424  0.408750
S7     anode  acetate   B    74.0   4.894088   1.845467  0.368960
S10  cathode  acetate   B   165.0  47.964757  13.685347  0.758031
S11  cathode  acetate   B   194.0  39.977780  10.033863  0.700156
S12  cathode  acetate   B   315.0  91.900986  32.390060  0.785859
S13  cathode  acetate   B   186.0  55.938017  21.741220  0.770080
S20    anode  glucose   D   150.0   7.297213   3.327548  0.396655
S21    anode  glucose   D   107.0   6.967200   3.379743  0.415426
S22    anode  glucose   D    57.0   4.626569   2.650418  0.378876
S23    anode  glucose   D   116.0   6.013025   3.044453  0.377384
S26  cathode  glucose   D   165.0  47.436514  16.851130  0.755862
S27  cathode  glucose   D   222.0  47.834682  14.928833  0.715895
S28  cathode  glucose   D   145.0  39.662705  14.361918  0.739523
S29  cathode  glucose   D   241.0  59.220636  20.035359  0.744106

The cathode samples have, indeed, higher evenness than the anode samples.

Phylogenetic diversity

Since a phylogenetic tree is included in the example data object, we can also calculate phylogenetic diversity using the diversity.phyl_alpha function.

[5]:

for q in [0, 1, 2]: #A for loop to calculate alpha diversity for several values of q
    alpha = qdiv.diversity.phyl_alpha(rarefied_obj, q=q)
    meta["PD_q"+str(q)] = alpha #Add the calculated values to the meta dataframe
print(meta) #Show the results

    location     feed mfc  Div_q0     Div_q1     Div_q2  Evenness      PD_q0  \
S4     anode  acetate   B    75.0   4.917773   1.862620  0.368931   8.758063
S5     anode  acetate   B   138.0   8.926633   2.448316  0.444272  13.629279
S6     anode  acetate   B    78.0   5.934636   2.080424  0.408750  10.161938
S7     anode  acetate   B    74.0   4.894088   1.845467  0.368960   9.647654
S10  cathode  acetate   B   165.0  47.964757  13.685347  0.758031  16.782338
S11  cathode  acetate   B   194.0  39.977780  10.033863  0.700156  19.017742
S12  cathode  acetate   B   315.0  91.900986  32.390060  0.785859  26.608242
S13  cathode  acetate   B   186.0  55.938017  21.741220  0.770080  18.946945
S20    anode  glucose   D   150.0   7.297213   3.327548  0.396655  15.332092
S21    anode  glucose   D   107.0   6.967200   3.379743  0.415426  11.947341
S22    anode  glucose   D    57.0   4.626569   2.650418  0.378876   8.704764
S23    anode  glucose   D   116.0   6.013025   3.044453  0.377384  12.915055
S26  cathode  glucose   D   165.0  47.436514  16.851130  0.755862  19.834035
S27  cathode  glucose   D   222.0  47.834682  14.928833  0.715895  23.010236
S28  cathode  glucose   D   145.0  39.662705  14.361918  0.739523  16.086969
S29  cathode  glucose   D   241.0  59.220636  20.035359  0.744106  22.700699

        PD_q1     PD_q2
S4   0.982211  0.830020
S5   1.186406  0.869423
S6   1.053486  0.848903
S7   1.011143  0.837457
S10  1.745365  0.899019
S11  1.611911  0.878486
S12  2.014242  0.942826
S13  1.814265  0.930605
S20  1.010650  0.964344
S21  0.972128  0.961420
S22  0.892799  0.917759
S23  0.912382  0.933111
S26  2.227134  1.020376
S27  2.013565  1.011542
S28  1.835422  0.996555
S29  1.760901  0.985247

Functional diversity

To calculate functional diversity, we need a distance matrix with pairwise functional distances between features in the dataset. The distance matrix should be in the form of a pandas dataframe with feature names as row- and column indices. For the purpose of demonstrating the use of the function, we will calculate such as distance matrix based on the phylogenetic distances between features. We use the sequences.tree_distance_matrix function to get a distance matrix saved in the variable distmat.

[4]:

distmat = qdiv.sequences.tree_distance_matrix(rarefied_obj, save=False) #By setting save=False, no file is saved

Leaves: 100%|██████████████████████████████████████████████████████████████████████| 671/671 [00:00<00:00, 1503.42it/s]

Next, we call the diversity.func_alpha function to calculate functional alpha diversity.

[5]:

alpha = qdiv.diversity.func_alpha(rarefied_obj, distmat, q=1)
print(alpha)

S4      109.939446
S5      267.304695
S6      130.601954
S7      117.680498
S10    1629.952535
S11    1477.581434
S12    5009.610236
S13    2030.422759
S20      56.720516
S21      45.512687
S22      19.199088
S23      29.713493
S26    1471.048419
S27    1419.489605
S28     912.282687
S29    2104.082254
dtype: float64

Beta diversity

Relationship between alpha, beta, and gamma diversity
One common way to define beta diversity is as a ratio between gamma and alpha: \(D_\beta=\frac{D_\gamma}{D_\alpha}\)
Gamma is the diversity of the pooled samples and alpha is the mean diversity of the individual samples.
Beta diversity takes a value between 1 and N (the number of samples being compared). For functional beta diversity it goes between 1 and NxN.

Beta diversity can be converted to a dissimilarity index, which takes a value between 0 (if the compared samples are identical) and 1 (if the compared samples are completely different). Dissimilarity can be calculated with a local or regional viewpoint. The local viewpoint quantifies the effective proportion of species in a sample that is not shared across all samples. The regional viewpoint quantifies the effective proportion of species in the pooled samples that is not shared across all samples. Just like alpha diversity, beta diversity can be calculated for any diveristy order, q, which means we can decide the emphasis we want to put on the relative abundance of the species.

Just like the alpha diversity indices, beta diversity and dissimilarity can be calculated using naive-, phylogenetic-, and functional indices.

Let’s calculate beta diversity for the example data. First, we’ll calculate pairwise dissimilarities between all samples in our data using the diversity.naive_beta function.

[6]:

import qdiv
obj = qdiv.MicrobiomeData.load_example("Saheb-Alam_DADA2")
rarefied_obj = obj.rarefy()

dissimilarity = qdiv.diversity.naive_beta(rarefied_obj, q=1)
print(dissimilarity.iloc[:5, :5]) #Here we print the results for five samples

           S4        S5        S6        S7       S10
S4   0.000000  0.058442  0.049793  0.075557  0.839512
S5   0.058442  0.000000  0.057203  0.073835  0.783948
S6   0.049793  0.057203  0.000000  0.067338  0.821311
S7   0.075557  0.073835  0.067338  0.000000  0.853798
S10  0.839512  0.783948  0.821311  0.853798  0.000000

The dissimilarity matrix shows the pairwise dissimilarity between the samples. It can, e.g., be used later for plotting an ordination.

We can also check the beta diversity across sample groups with multiple samples. Let’s group the samples based on feed (one recieved acetate and one glucose). We’ll use the diversity.naive_multi_beta function.

[7]:

multi_beta = qdiv.diversity.naive_multi_beta(rarefied_obj, by="feed", q=1)
print(multi_beta)

           N      beta  local_dis  regional_dis
acetate  8.0  1.967056   0.325346      0.325346
glucose  8.0  1.666429   0.245587      0.245587

Each sample group contains 8 samples (N=8). The beta diversity value indicates the effective number of distinct communities represented by the data. In other words, it reflects how much differentiation exists among the samples within a group. To provide additional perspective, beta diversity is also expressed as dissimilarity indices, calculated using either:

local view, which considers differences relative to individual samples, or
regional view, which considers differences relative to the entire pooled set of samples.

The analysis can also be done with diversity.phyl_multi_beta or diversity.func_multi_beta.