Diversity
Diversity in ecological studies refers to different aspects of variation within and across communities:
Alpha diversity: The richness and evenness of taxa (or “features”) within a single sample or site. It measures how many different taxa are present and how evenly they are distributed.
Beta diversity: The degree of difference in community composition between two or more samples. It captures how similar or dissimilar communities are across environments.
Gamma diversity: The overall diversity across a collection of samples or an entire region, considering both the number of taxa and their distribution at a broader spatial scale.
Naive alpha diversity is simply the effective number of taxa, phylogenetic alpha diversity is the effective branch diversity in the phylogenetic tree associated with the taxa in the sample, and functional alpha diversity is the effective total distance between taxa in the sample. The word “effective” used above refers to the relative abundance weighting by the diversity order, q.
If we ignore relative abundances, the answer is 10.
If we ignore rare species, the answer is 2.
Hill numbers solve this by introducing a single parameter, the diversity order (q), which controls how much weight is given to species’ relative abundances:
When q=0, all species count equally (richness).
When q=1, species are weighted by their proportional abundance (Shannon diversity).
When q=2, common species dominate the measure (Simpson diversity).
This flexibility makes Hill numbers a powerful and intuitive framework for comparing communities. And it doesn’t stop at 0, 1, or 2 — diversity can be explored across a continuum of q values, offering a nuanced view of community structure.
Alpha diversity
Let’s calculate alpha diversity using qdiv. First, we will load and rarefy some example data.
[1]:
import qdiv
obj = qdiv.MicrobiomeData.load_example("Saheb-Alam_DADA2")
rarefied_obj = obj.rarefy() #I rarefy the object before calculating alpha diversity
Next, we will calculate the naive alpha diversity values for the samples using the diversity.naive_alpha function.
[2]:
alpha = qdiv.diversity.naive_alpha(rarefied_obj, q=1)
print(alpha)
S4 4.917773
S5 8.926633
S6 5.934636
S7 4.894088
S10 47.964757
S11 39.977780
S12 91.900986
S13 55.938017
S20 7.297213
S21 6.967200
S22 4.626569
S23 6.013025
S26 47.436514
S27 47.834682
S28 39.662705
S29 59.220636
dtype: float64
This was the alpha diversity for diversity order q=1. Perhaps we want to calculate it for multiple q values and save it into a csv file together with the meta data.
[3]:
meta = rarefied_obj.meta.copy() #We make a copy of the metadata to avoid modifying the object
for q in [0, 1, 2]: #A for loop to calculate alpha diversity for several values of q
alpha = qdiv.diversity.naive_alpha(rarefied_obj, q=q)
meta["Div_q"+str(q)] = alpha #Add the calculated values to the meta dataframe
meta.to_csv("Metadata_with_alpha_diversity.csv") #This lines saves the dataframe as a CSV file.
print(meta) #This prints the file content to show it below
location feed mfc Div_q0 Div_q1 Div_q2
S4 anode acetate B 75.0 4.917773 1.862620
S5 anode acetate B 138.0 8.926633 2.448316
S6 anode acetate B 78.0 5.934636 2.080424
S7 anode acetate B 74.0 4.894088 1.845467
S10 cathode acetate B 165.0 47.964757 13.685347
S11 cathode acetate B 194.0 39.977780 10.033863
S12 cathode acetate B 315.0 91.900986 32.390060
S13 cathode acetate B 186.0 55.938017 21.741220
S20 anode glucose D 150.0 7.297213 3.327548
S21 anode glucose D 107.0 6.967200 3.379743
S22 anode glucose D 57.0 4.626569 2.650418
S23 anode glucose D 116.0 6.013025 3.044453
S26 cathode glucose D 165.0 47.436514 16.851130
S27 cathode glucose D 222.0 47.834682 14.928833
S28 cathode glucose D 145.0 39.662705 14.361918
S29 cathode glucose D 241.0 59.220636 20.035359
The results seem to indicate higher diversity on cathode than anodes, and higher diversity in glucose-fed than acetate-fed microbial fuel cells. Higher diversity orders (q values) result in lower diversity. The change in diversity as q increases is an indication of the evenness. We can calculate evenness parameters using the function below.
[4]:
evenness = qdiv.diversity.evenness(rarefied_obj, index="pielou")
meta["Evenness"] = evenness
print(meta)
location feed mfc Div_q0 Div_q1 Div_q2 Evenness
S4 anode acetate B 75.0 4.917773 1.862620 0.368931
S5 anode acetate B 138.0 8.926633 2.448316 0.444272
S6 anode acetate B 78.0 5.934636 2.080424 0.408750
S7 anode acetate B 74.0 4.894088 1.845467 0.368960
S10 cathode acetate B 165.0 47.964757 13.685347 0.758031
S11 cathode acetate B 194.0 39.977780 10.033863 0.700156
S12 cathode acetate B 315.0 91.900986 32.390060 0.785859
S13 cathode acetate B 186.0 55.938017 21.741220 0.770080
S20 anode glucose D 150.0 7.297213 3.327548 0.396655
S21 anode glucose D 107.0 6.967200 3.379743 0.415426
S22 anode glucose D 57.0 4.626569 2.650418 0.378876
S23 anode glucose D 116.0 6.013025 3.044453 0.377384
S26 cathode glucose D 165.0 47.436514 16.851130 0.755862
S27 cathode glucose D 222.0 47.834682 14.928833 0.715895
S28 cathode glucose D 145.0 39.662705 14.361918 0.739523
S29 cathode glucose D 241.0 59.220636 20.035359 0.744106
The cathode samples have, indeed, higher evenness than the anode samples.
[5]:
for q in [0, 1, 2]: #A for loop to calculate alpha diversity for several values of q
alpha = qdiv.diversity.phyl_alpha(rarefied_obj, q=q)
meta["PD_q"+str(q)] = alpha #Add the calculated values to the meta dataframe
print(meta) #Show the results
location feed mfc Div_q0 Div_q1 Div_q2 Evenness PD_q0 \
S4 anode acetate B 75.0 4.917773 1.862620 0.368931 8.758063
S5 anode acetate B 138.0 8.926633 2.448316 0.444272 13.629279
S6 anode acetate B 78.0 5.934636 2.080424 0.408750 10.161938
S7 anode acetate B 74.0 4.894088 1.845467 0.368960 9.647654
S10 cathode acetate B 165.0 47.964757 13.685347 0.758031 16.782338
S11 cathode acetate B 194.0 39.977780 10.033863 0.700156 19.017742
S12 cathode acetate B 315.0 91.900986 32.390060 0.785859 26.608242
S13 cathode acetate B 186.0 55.938017 21.741220 0.770080 18.946945
S20 anode glucose D 150.0 7.297213 3.327548 0.396655 15.332092
S21 anode glucose D 107.0 6.967200 3.379743 0.415426 11.947341
S22 anode glucose D 57.0 4.626569 2.650418 0.378876 8.704764
S23 anode glucose D 116.0 6.013025 3.044453 0.377384 12.915055
S26 cathode glucose D 165.0 47.436514 16.851130 0.755862 19.834035
S27 cathode glucose D 222.0 47.834682 14.928833 0.715895 23.010236
S28 cathode glucose D 145.0 39.662705 14.361918 0.739523 16.086969
S29 cathode glucose D 241.0 59.220636 20.035359 0.744106 22.700699
PD_q1 PD_q2
S4 0.982211 0.830020
S5 1.186406 0.869423
S6 1.053486 0.848903
S7 1.011143 0.837457
S10 1.745365 0.899019
S11 1.611911 0.878486
S12 2.014242 0.942826
S13 1.814265 0.930605
S20 1.010650 0.964344
S21 0.972128 0.961420
S22 0.892799 0.917759
S23 0.912382 0.933111
S26 2.227134 1.020376
S27 2.013565 1.011542
S28 1.835422 0.996555
S29 1.760901 0.985247
[4]:
distmat = qdiv.sequences.tree_distance_matrix(rarefied_obj, save=False) #By setting save=False, no file is saved
Leaves: 100%|██████████████████████████████████████████████████████████████████████| 671/671 [00:00<00:00, 1503.42it/s]
Next, we call the diversity.func_alpha function to calculate functional alpha diversity.
[5]:
alpha = qdiv.diversity.func_alpha(rarefied_obj, distmat, q=1)
print(alpha)
S4 109.939446
S5 267.304695
S6 130.601954
S7 117.680498
S10 1629.952535
S11 1477.581434
S12 5009.610236
S13 2030.422759
S20 56.720516
S21 45.512687
S22 19.199088
S23 29.713493
S26 1471.048419
S27 1419.489605
S28 912.282687
S29 2104.082254
dtype: float64
Beta diversity
Beta diversity can be converted to a dissimilarity index, which takes a value between 0 (if the compared samples are identical) and 1 (if the compared samples are completely different). Dissimilarity can be calculated with a local or regional viewpoint. The local viewpoint quantifies the effective proportion of species in a sample that is not shared across all samples. The regional viewpoint quantifies the effective proportion of species in the pooled samples that is not shared across all samples. Just like alpha diversity, beta diversity can be calculated for any diveristy order, q, which means we can decide the emphasis we want to put on the relative abundance of the species.
[6]:
import qdiv
obj = qdiv.MicrobiomeData.load_example("Saheb-Alam_DADA2")
rarefied_obj = obj.rarefy()
dissimilarity = qdiv.diversity.naive_beta(rarefied_obj, q=1)
print(dissimilarity.iloc[:5, :5]) #Here we print the results for five samples
S4 S5 S6 S7 S10
S4 0.000000 0.058442 0.049793 0.075557 0.839512
S5 0.058442 0.000000 0.057203 0.073835 0.783948
S6 0.049793 0.057203 0.000000 0.067338 0.821311
S7 0.075557 0.073835 0.067338 0.000000 0.853798
S10 0.839512 0.783948 0.821311 0.853798 0.000000
[7]:
multi_beta = qdiv.diversity.naive_multi_beta(rarefied_obj, by="feed", q=1)
print(multi_beta)
N beta local_dis regional_dis
acetate 8.0 1.967056 0.325346 0.325346
glucose 8.0 1.666429 0.245587 0.245587
Each sample group contains 8 samples (N=8). The beta diversity value indicates the effective number of distinct communities represented by the data. In other words, it reflects how much differentiation exists among the samples within a group. To provide additional perspective, beta diversity is also expressed as dissimilarity indices, calculated using either:
local view, which considers differences relative to individual samples, or
regional view, which considers differences relative to the entire pooled set of samples.
The analysis can also be done with diversity.phyl_multi_beta or diversity.func_multi_beta.