qdiv.diversity

diversity

Provides functions for alpha and beta diversity calculations.

qdiv.diversity.naive_alpha(tab, *, q=1, use_values_in_tab=False)[source]

Compute naive alpha diversity of order q for all samples.

Accepts:

DataFrame: features x samples
MicrobiomeData-like object: must expose a DataFrame in .tab / .table / .counts / .abundance
dict-of-dicts: either {feature: {sample: count}} or {sample: {feature: count}}

Parameters:

tab (DataFrame | MicrobiomeData-like | dict) – Abundance table (features x samples) or convertible structure.
q (float, default=1) – Diversity order: - q = 0 : species richness - q = 1 : exponential of Shannon entropy - q = 2 : inverse Simpson - general q : Hill number of order q
use_values_in_tab (bool, default=False) – If False (default), values are converted to relative abundances. If True, values in tab are assumed to already be relative abundances.

Returns:

Hill numbers for each sample. If input has one sample/column, returns a float.

Return type:

pandas.Series or float

Notes

For q = 1, the limit definition is used:
H₁ = exp( - Σ pᵢ ln pᵢ )
For q ≠ 1:
H_q = ( Σ pᵢ^q )^( 1 / (1 - q) )
Zero abundances are ignored safely.

qdiv.diversity.phyl_alpha(obj, *, q=1, index='D', use_values_in_tab=False)[source]

Compute phylogenetic alpha diversity based on Hill numbers.

This function implements the abundance-weighted phylogenetic diversity framework of Chao et al. (2010, Phil. Trans. R. Soc. B). Diversity is computed at the level of tree branches, where each branch is weighted by its length and by the total relative abundance of all descendant features.

The primary quantity returned is the mean phylogenetic diversity D̄_q(T), which is a true Hill number (dimensionless, continuous, and monotone in q). A branch-length–scaled quantity (phylogenetic diversity, PD_q) can optionally be returned as a derived measure.

Parameters:

obj (MicrobiomeData-like | dict) –
Object containing an abundance table (tab) and a tree dataframe (tree). The tree dataframe must include:
- ’leaves’ : list of descendant leaves for each branch
- ’branchL’ : branch length
q (float, default=1) – Diversity order: - q = 0 : presence/absence weighting (Faith’s PD when index=’PD’) - q = 1 : exponential phylogenetic Shannon diversity - q = 2 : phylogenetic inverse Simpson diversity - general q : phylogenetic Hill number
index ({'D', 'PD', 'H'}, default='D') –
Quantity to return: - ‘D’ : mean phylogenetic diversity D̄_q(T) (dimensionless; Hill number) - ‘PD’ : branch diversity PD_q(T) = T · D̄_q(T) - ‘H’ : entropy-like intermediate quantity:
- q = 1 : phylogenetic entropy divided by T
- q ≠ 1 : power-sum moment Σ_b (L_b/T) a_b^q
use_values_in_tab (bool, default=False) – If False, abundances are converted to relative abundances per sample. If True, the abundance table is assumed to already contain relative abundances.

Returns:

A vector of diversity values, one per sample.

Return type:

pandas.Series

Notes

For each sample j, the mean tree height is computed as:: T_j = Σ_b L_b · a_{b,j}
Mean phylogenetic diversity is defined as:: D̄_q(T) = ( Σ_b (L_b / T_j) · a_{b,j}^q )^(1 / (1 − q)), q ≠ 1 D̄_1(T) = exp( − Σ_b (L_b / T_j) · a_{b,j} · log a_{b,j} )

where a_{b,j} is the total relative abundance descending from branch b.

The branch diversity PD_q(T) = T_j · D̄_q(T) has units of branch length (or evolutionary time) and represents effective evolutionary work. Unlike D̄_q(T), PD_q(T) is not a Hill number for q ≠ 0, 1 and is not guaranteed to be monotone in q.

qdiv.diversity.func_alpha(tab, distmat, *, q=1, index='FD', use_values_in_tab=False)[source]

Compute functional alpha diversity (Hill numbers) of order q.

Implements the framework of Chiu et al. (2014, PLoS ONE), where functional diversity is derived from pairwise trait distances and species abundances.

For each sample, functional diversity is computed from:

Q = Σᵢ Σⱼ pᵢ pⱼ dᵢⱼ (Rao’s quadratic entropy)

and the functional Hill number of order q:

q = 1:
FD₁ = exp( -½ Σᵢ Σⱼ (pᵢ pⱼ ln(pᵢ pⱼ)) dᵢⱼ / Q )

q ≠ 1:
FD_q = ( Σᵢ Σⱼ (pᵢ pⱼ)ᵠ dᵢⱼ / Q )^( 1 / (2(1−q)) )

Parameters:

tab (DataFrame | MicrobiomeData-like | dict) – Abundance table (features x samples) or convertible structure.
distmat (pandas.DataFrame) – Functional distance matrix (features × features).
q (float, default=1) – Diversity order.
index ({'FD', 'D', 'MD'}, default='FD') – Output type: - ‘D’ : functional Hill number - ‘MD’ : mean functional diversity (D × Q) - ‘FD’ : functional diversity (D × MD)
use_values_in_tab (bool, default=False) – If False, convert abundances to relative abundances. If True, assume tab already contains relative abundances.

Returns:

Functional diversity values for each sample.

Return type:

pandas.Series

Notes

Uses Rao’s Q as implemented in your rao() function.
Zero abundances are handled safely.

qdiv.diversity.mpdq(obj, distmat, *, q=1.0)[source]

Mean phylogenetic distance (MPD) with q-weighting of relative abundances. Accepts either a MicrobiomeData object or a dict with at least a ‘tab’ DataFrame.

Parameters:

obj (MicrobiomeData, dict, or compatible object) – Input data. Must provide at least an abundance table (‘tab’).
distmat (pd.DataFrame) – Square distance matrix indexed/columned by feature ids.
q (float, default=1.0) – Order of diversity weighting applied to relative abundances.

Return type:

pandas.DataFrame

References

Webb et al. (2002) American Naturalist.

qdiv.diversity.mntdq(obj, distmat, *, q=1.0)[source]

Mean nearest taxon distance (MNTD) with q-weighting of relative abundances.

Parameters:

obj (MicrobiomeData, dict, or compatible object) – Input data. Must provide at least an abundance table (‘tab’).
distmat (pd.DataFrame) – Square distance matrix indexed/columned by feature ids.
q (float, default=1.0) – Order of diversity weighting applied to relative abundances.

Return type:

pandas.DataFrame

qdiv.diversity.naive_beta(tab, *, q=1, dis=True, viewpoint='regional', use_values_in_tab=False)[source]

Compute naive (taxonomic) pairwise beta diversity of order q.

Implements the two‑community Hill‑number beta diversity framework described in Chao et al. (2014), using only species abundances (no phylogenetic or functional information).

For two samples A and B:

α_q = Hill number of the average of A and B γ_q = Hill number of the pooled community β_q = γ_q / α_q

Special case q = 1 uses the Shannon limit:

α₁ = exp( -½ Σ pᵢ ln pᵢ - ½ Σ qᵢ ln qᵢ ) γ₁ = exp( -Σ mᵢ ln mᵢ )

Parameters:

tab (DataFrame | MicrobiomeData-like | dict) – Abundance table (features x samples) or convertible structure.
q (float, default=1) – Diversity order.
dis (bool, default=True) – If True, convert β to a dissimilarity using beta2dist. If False, return raw β values.
viewpoint ({'local', 'regional'}, default='regional') – Viewpoint for converting β to dissimilarity.
use_values_in_tab (bool, default=False) – If False, convert abundances to relative abundances. If True, assume tab already contains relative abundances.

Returns:

Pairwise β-diversity (or dissimilarity) matrix.

Return type:

pandas.DataFrame

Notes

Requires beta2dist() to be defined elsewhere.
Only works for ≥ 2 samples.

qdiv.diversity.phyl_beta(obj, *, q=1, dis=True, viewpoint='regional', use_values_in_tab=False)[source]

Compute phylogenetic pairwise beta diversity of order q.

Implements the two‑community phylogenetic Hill‑number beta framework described in Chao et al. (2014), where branch lengths are weighted by the relative abundances of all features descending from each branch.

For two samples A and B:

α_q = phylogenetic Hill number of the average of A and B γ_q = phylogenetic Hill number of the pooled community β_q = γ_q / α_q

Special case q = 1 uses the Shannon limit.

Parameters:

obj (MicrobiomeData-like | dict) –
Must provide:
- ’tab’: feature × sample abundance DataFrame
- ’tree’: branch × columns DataFrame with:
  
  ’leaves’ : iterable/list of leaf IDs under each branch
  
  ’branchL’: branch length (float)
q (float, default=1) – Diversity order.
dis (bool, default=True) – If True, convert β to a dissimilarity using beta2dist.
viewpoint ({'local', 'regional'}, default='regional') – Viewpoint for converting β to dissimilarity.
use_values_in_tab (bool, default=False) – If False, convert abundances to relative abundances. If True, assume tab already contains relative abundances.

Returns:

Pairwise phylogenetic β-diversity (or dissimilarity) matrix.

Return type:

pandas.DataFrame

Notes

Requires beta2dist() to be defined elsewhere.
Only works for ≥ 2 samples.

qdiv.diversity.func_beta(tab, distmat, *, q=1, dis=True, viewpoint='regional', use_values_in_tab=False, use_tqdm=True)[source]

Compute functional pairwise beta diversity of order q.

Implements the two‑community functional Hill‑number beta framework based on local functional overlaps as in Chao et al. (2014). Functional diversity is derived from pairwise trait distances between ASVs and their abundances.

For each pair of samples (A, B), the method computes:

Dg : functional Hill number for the pooled community (gamma)

Da : functional Hill number for the “average” community (alpha)

beta = Dg / Da

For q = 1, the Shannon-type limit is used; for q ≠ 1, the general Hill-number form is used.

Parameters:

tab (DataFrame | MicrobiomeData-like | dict) – Abundance table (features x samples) or convertible structure.
distmat (pandas.DataFrame) – Functional distance matrix (ASVs × ASVs), symmetric and indexed by the same ASVs as tab.
q (float, default=1) – Diversity order.
dis (bool, default=True) – If True, convert β to a dissimilarity using beta2dist.
viewpoint ({'local', 'regional'}, default='regional') – Viewpoint for converting β to dissimilarity.
use_values_in_tab (bool, default=False) – If False, convert abundances to relative abundances. If True, assume tab already contains relative abundances.
use_tqdm (bool, default=True) – Use tqdm for progress bars.

Returns:

Pairwise functional dissimilarity matrix (if dis=True) or squared functional beta (β²) matrix (if dis=False).

Return type:

pandas.DataFrame

Notes

Only works for ≥ 2 samples.

qdiv.diversity.bray(tab, *, use_values_in_tab=False)[source]

Compute the Bray–Curtis dissimilarity matrix between all samples.

Bray–Curtis dissimilarity between two samples A and B is:

BC(A, B) = 1 − Σ_i min(p_iA, p_iB)

where p_iA and p_iB are relative abundances of feature i in samples A and B.

Parameters:

tab (DataFrame | MicrobiomeData-like | dict) – Abundance table (features x samples) or convertible structure.
use_values_in_tab (bool, default=False) – If False, convert abundances to relative abundances. If True, assume tab already contains relative abundances.

Returns:

Symmetric Bray–Curtis dissimilarity matrix.

Return type:

pandas.DataFrame

Notes

Requires at least two samples.
Zero-sum samples are not allowed unless use_values_in_tab=True.

qdiv.diversity.jaccard(tab, *, use_values_in_tab=False)[source]

Compute the Jaccard dissimilarity matrix between all samples.

Jaccard dissimilarity between two samples A and B is:

J(A, B) = 1 − ( |A ∩ B| / |A ∪ B| )

where presence/absence is determined by whether abundance > 0.

Parameters:

tab (DataFrame | MicrobiomeData-like | dict) – Abundance table (features x samples) or convertible structure.
use_values_in_tab (bool, default=False) – Ignored for Jaccard (presence/absence only), included for API symmetry.

Returns:

Symmetric Jaccard dissimilarity matrix.

Return type:

pandas.DataFrame

Notes

Requires at least two samples.
Abundances are converted to binary presence/absence.

qdiv.diversity.naive_multi_beta(obj, *, by=None, q=1)[source]

Compute naive (taxonomic) multi‑sample beta diversity for groups of samples.

This implements the multi‑sample Hill‑number beta framework:

β_q = γ_q / ( α_q / N )

where:

γ_q is the Hill number of the pooled community
α_q is the mean within‑sample Hill number
N is the number of samples in the group

Parameters:

obj (MicrobiomeData-like | dict) –
Must contain:
- ’meta’ : pandas.DataFrame with sample metadata
- ’tab’ : pandas.DataFrame with feature counts (features × samples)
by (str or None, default=None) – Column in metadata defining sample groups. If None, all samples are treated as one group.
q (float, default=1) – Diversity order.

Returns:

Index = categories in var (or ‘all’ if var=None) Columns:

N : number of samples in group

beta : multi‑sample beta diversity

local_dis : local‑viewpoint dissimilarity

regional_dis : regional‑viewpoint dissimilarity

Return type:

pandas.DataFrame

Notes

Groups with <2 samples return NaN.

qdiv.diversity.phyl_multi_beta(obj, *, by=None, q=1)[source]

Compute phylogenetic multi‑sample beta diversity for groups of samples.

Implements the multi‑sample phylogenetic Hill‑number beta framework described in Chao et al. (2014), where branch lengths are weighted by the relative abundances of all ASVs descending from each branch.

For each group of samples:

β_q = γ_q / ( α_q / N )

where:

γ_q is the phylogenetic Hill number of the pooled community
α_q is the mean within‑sample phylogenetic Hill number
N is the number of samples in the group

Parameters:

obj (MicrobiomeData-like | dict) –
Must contain:
- ’meta’ : pandas.DataFrame with sample metadata
- ’tab’ : pandas.DataFrame with ASV counts (ASVs × samples)
- ’tree’pandas.DataFrame with:
  
  ’leaves’ : list of features under each branch
  
  ’branchL’ : branch length
by (str or None, default=None) – Metadata column defining sample groups. If None, all samples are treated as one group.
q (float, default=1) – Diversity order.

Returns:

Index = categories in by (or ‘all’ if by=None) Columns:

N : number of samples in group

beta : multi‑sample phylogenetic beta diversity

local_dis : local‑viewpoint dissimilarity

regional_dis : regional‑viewpoint dissimilarity

Return type:

pandas.DataFrame

Notes

Only works for ≥ 2 samples per group.

qdiv.diversity.func_multi_beta(obj, distmat, *, by=None, q=1)[source]

Compute functional multi‑sample beta diversity for groups of samples.

Implements the multi‑sample functional Hill‑number beta framework described in Chiu et al. (2014), where functional diversity is derived from pairwise trait distances and species abundances.

For each group of samples:

β_q = D_gamma / D_alpha

where:

D_gamma is the functional Hill number of the pooled community
D_alpha is the mean functional Hill number across all sample pairs
N is the number of samples in the group
NxN = N² (number of ordered sample pairs)

Parameters:

obj (MicrobiomeData-like | dict) –
Must contain:
- ’meta’ : pandas.DataFrame with sample metadata
- ’tab’ : pandas.DataFrame (features × samples)
distmat (pandas.DataFrame) – Functional distance matrix (features × features).
by (str or None, default=None) – Metadata column defining sample groups. If None, all samples are treated as one group.
q (float, default=1) – Diversity order.

Returns:

Index = categories in by (or ‘all’ if by=None) Columns:

NxN : N² (number of ordered sample pairs)

beta : functional multi‑sample beta diversity

local_dis : local‑viewpoint dissimilarity

regional_dis : regional‑viewpoint dissimilarity

Return type:

pandas.DataFrame

Notes

Only works for ≥ 2 samples per group.

qdiv.diversity.evenness(obj, distmat=None, *, q=1, div_type='naive', index='pielou', perspective='samples', use_values_in_tab=False)[source]

Compute evenness measures from Chao & Ricotta (2019, Ecology 100:e02852), with optional support for Pielou’s classical evenness index.

Supports:

naive (taxonomic) evenness
phylogenetic evenness
functional evenness

Supported evenness indices:

CR1 (regional evenness)
CR2 (local evenness)
CR3
CR4
CR5
pielou (Pielou’s J; defined only for q = 1)

Parameters:

obj (DataFrame | MicrobiomeData-like | dict) – Including abundance table (features × samples) and optionally tree (pandas.DataFrame, required if divType=’phyl’)
distmat (pandas.DataFrame, optional) – Required if divType=’func’. Functional distance matrix.
q (float, default=1) – Diversity order.
div_type ({'naive', 'phyl', 'func'}) – Type of diversity measure used to compute D.
index ({'CR1','CR2','CR3','CR4','CR5','local','regional','pielou'}) – Evenness index to compute.
perspective ({'samples','taxa'}) – Whether to compute evenness across samples (columns) or across taxa/branches (rows).
use_values_in_tab (bool, default=False) – If False, convert abundances to relative abundances.

Returns:

Evenness values indexed by sample or taxon.

Return type:

pandas.Series

Notes

CR1 = regional evenness
CR2 = local evenness
CR3–CR5 are alternative evenness formulations from Chao & Ricotta (2019)
Pielou’s index is included for convenience and corresponds to:
J = H’ / ln(S) = ln(D₁) / ln(S)

where D₁ is the Hill number of order q = 1.

qdiv.diversity.dissimilarity_by_feature(obj, *, by=None, q=1, div_type='naive', index='regional', use_values_in_tab=False)[source]

Compute the contribution of individual taxa (or phylogenetic nodes) to the overall dissimilarity between multiple samples, following Chao & Ricotta (2019, Ecology 100:e02852).

Supports:

naive (taxonomic) dissimilarity
phylogenetic dissimilarity

Parameters:

obj (DataFrame | MicrobiomeData-like | dict) –
Must contain:
- ’tab’ : abundance table (features × samples)
- ’meta’ : metadata table (optional if by=None)
- ’tree’ : phylogenetic tree (required if divType=’phyl’)
by (str or None, default=None) – Metadata column defining sample groups. If None, all samples are treated as one group.
q (float, default=1) – Diversity order.
div_type ({'naive','phyl'}, default='naive') – Type of dissimilarity measure.
index ({'local','regional','CR1','CR2'}, default='regional') – Evenness/dissimilarity index.
use_values_in_tab (bool, default=False) – If False, convert abundances to relative abundances.

Returns:

Rows:

’dis’ : total dissimilarity
’N’ : number of samples in group
one row per taxon (naive) or per node (phylogenetic)

Columns:

one column per category in by

Return type:

pandas.DataFrame

qdiv.diversity.beta_mpdq(obj, distmat, *, q=1.0)[source]

Computes beta-MPD_q for all sample pairs.

Parameters:

obj (MicrobiomeData, dict, or compatible object) – Input data. Must provide at least an abundance table (‘tab’).
distmat (pd.DataFrame) – Square distance matrix indexed/columned by feature ids.
q (float, default=1.0) – Order of diversity weighting applied to relative abundances.

Return type:

pandas.DataFrame (S x S)

qdiv.diversity.beta_mntdq(obj, distmat, *, q=1.0, include_conspecifics=False)[source]

Computes beta-MNTD_q for all sample pairs.

Parameters:

obj (MicrobiomeData, dict, or compatible object) – Input data. Must provide at least an abundance table (‘tab’).
distmat (pd.DataFrame) – Square distance matrix indexed/columned by feature ids.
q (float, default=1.0) – Order of diversity weighting applied to relative abundances.
include_conspecifics (bool, default=False) – Determines whether conspecifics (identical features shared between samples) are allowed to contribute zero-distance matches in the nearest-taxon calculation.

Return type:

pandas.DataFrame (S x S)