qdiv.model
model
Provides functions for null models and simulations.
- qdiv.model.rcq(obj, *, constrain_by=None, randomization='frequency', iterations=999, div_type='naive', distmat=None, q=1.0, use_tqdm=True, random_state=None, **kwargs)[source]
Raup–Crick-style null comparisons for beta-diversity.
Randomizes the abundance table while preserving each sample’s richness and total reads, then contrasts the observed beta-diversity matrix against a null distribution built via randomization.
- Parameters:
obj (MicrobiomeData | dict | Any) – Input with at least an abundance table under key ‘tab’. Optionally may include ‘meta’ (sample metadata) and ‘tree’ (for phylogenetic measures).
constrain_by (str, optional) – Column in metadata to constrain randomization within categories; if None, randomize across all samples.
randomization ({"frequency", "abundance"}, default="frequency") –
- Randomization strategy for selecting the set of taxa per randomized sample:
”abundance”: probabilities proportional to group-level summed abundances
”frequency”: probabilities proportional to group-level presence frequency
Within the selected set, additional reads are allocated proportional to the selected taxa’s group-level abundances to match each sample’s total reads.
iterations (int, default=999) – Number of randomization iterations used to build the null distribution.
div_type ({"Jaccard", "Bray", "naive", "phyl", "func"}, default="naive") – Dissimilarity index to compute for observed and null tables. - “Jaccard”, “Bray”: classic indices on the (randomized) count table - “naive”: Hill-number-based (requires q) - “phyl”: phylogenetic beta diversity (requires ‘tree’ in obj) - “func”: functional beta diversity (requires distmat)
distmat (pandas.DataFrame, optional) – Square functional distance matrix (features × features); required if div_type=”func”.
q (float, default=1.0) – Diversity order for Hill-number-based indices (used by “naive”, “phyl”, “func”).
use_tqdm (bool, default=True) – Use tqdm for progress bars.
random_state (int | numpy.random.Generator, optional) – Random seed or Generator for reproducibility.
- Returns:
- {
“div_type”: str, “obs_d”: DataFrame (S × S), observed beta-diversity, “p”: DataFrame (S × S), Raup–Crick probability P(null < obs) + 0.5·P(null == obs), “null_mean”:DataFrame (S × S), mean of null, “null_std”: DataFrame (S × S), std of null, “ses”: DataFrame (S × S), (null_mean - obs) / null_std
}
- Return type:
dict
Notes
Per-sample constraints: if constrain_by is given, randomization is performed within each metadata category independently to preserve structure. Otherwise, all samples are randomized together.
Richness & read preservation: for each sample, we draw a set of taxa matching the original richness, then allocate extra reads to match the original total reads.
Raup–Crick p-index: counts how often the null dissimilarity is strictly lower than observed, ties contribute 0.5, normalized by iterations.
A p value close to zero means observed dissimilarity is lower than the null expectation.
A p value close to one means observed dissimilarity is higher than the null expectation.
A positive ses means observed dissimilarity is lower than the null expectation.
A negative ses means observed dissimilarity is higher than the null expectation.
- qdiv.model.nriq(obj, distmat, *, q=1.0, iterations=999, randomization='features', use_tqdm=True, random_state=None, **kwargs)[source]
Net Relatedness Index (NRI) with q-weighting of relative abundances. Accepts either a MicrobiomeData object or a dict with at least a ‘tab’ DataFrame.
- Parameters:
obj (MicrobiomeData, dict, or compatible object) – Input data. Must provide at least an abundance table (‘tab’).
distmat (pd.DataFrame) – Square distance matrix indexed/columned by feature ids.
q (float, default=1.0) – Order of diversity weighting applied to relative abundances.
iterations (int, default=999) – Number of random permutations of distmat.
randomization ({'features', 'abundances'}, default='features') – Randomization strategy. Shuffle features in the phylogenetic tree or relative abundance values in each sample.
use_tqdm (bool, default=True) – Use tqdm for progress bars.
random_state (int or np.random.Generator, optional) – Random seed or generator for reproducibility.
- Returns:
Indexed by sample names with columns: - ‘MPDq’ - ‘null_mean’ - ‘null_std’ - ‘p’ (Pr[ null < observed ] + 0.5*ties) / iterations - ‘ses’ (null_mean - observed) / null_std
- Return type:
pandas.DataFrame
Notes
A p value close to zero means that the observed MPD is lower than the null expectation
A p value close to one means that the observed MPD is higher than the null expectation
A positive ses means that the observed MPD is lower than the null expectation
A negative ses means that the observed MPD is higher than the null expectation
References
Webb et al. (2002) American Naturalist.
- qdiv.model.ntiq(obj, distmat, *, q=1.0, iterations=999, randomization='features', use_tqdm=True, random_state=None, **kwargs)[source]
Nearest Taxon Index (NTI) with q-weighting of relative abundances. Computes MNTD_q (mean nearest-taxon distance with q-weighted abundances), then compares to a null obtained by either permuting feature labels (“features”) or shuffling abundances within each sample (“abundances”).
- Parameters:
obj (MicrobiomeData, dict, or compatible object) – Input data. Must provide at least an abundance table (‘tab’).
distmat (pd.DataFrame) – Square distance matrix indexed/columned by feature ids.
q (float, default=1.0) – Order of diversity weighting applied to relative abundances.
iterations (int, default=999) – Number of random permutations of distmat.
randomization ({'features', 'abundances'}, default='features') – Randomization strategy. Shuffle features in the phylogenetic tree or relative abundance values in each sample.
use_tqdm (bool, default=True) – Use tqdm for progress bars.
random_state (int or np.random.Generator, optional) – Random seed or generator for reproducibility.
- Returns:
Indexed by sample names with columns: - ‘MNTDq’ - ‘null_mean’ - ‘null_std’ - ‘p’ (Pr[ null < observed ] + 0.5*ties) / iterations - ‘ses’ (null_mean - observed) / null_std
- Return type:
pandas.DataFrame
Notes
A p value close to zero means that the observed MPNTD is lower than the null expectation
A p value close to one means that the observed MNTD is higher than the null expectation
A positive ses means that the observed MNTD is lower than the null expectation
A negative ses means that the observed MNTD is higher than the null expectation
- qdiv.model.beta_nriq(obj, distmat, *, q=1.0, iterations=999, randomization='features', use_tqdm=True, random_state=None, **kwargs)[source]
Computes beta-MPD_q for all sample pairs, then contrasts against a null generated by (a) feature label permutations (“features”) or (b) within-sample abundance shuffles (“abundances”).
- Parameters:
obj (MicrobiomeData, dict, or compatible object) – Input data. Must provide at least an abundance table (‘tab’).
distmat (pd.DataFrame) – Square distance matrix indexed/columned by feature ids.
q (float, default=1.0) – Order of diversity weighting applied to relative abundances.
iterations (int, default=999) – Number of random permutations of distmat.
randomization ({'features', 'abundances'}, default='features') – Randomization strategy. Shuffle features in the phylogenetic tree or relative abundance values in each sample.
use_tqdm (bool, default=True) – Use tqdm for progress bars.
random_state (int or np.random.Generator, optional) – Random seed or generator for reproducibility.
- Returns:
‘beta_MPDq’ : observed beta-MPD_q ‘null_mean’ : mean of null beta-MPD_q ‘null_std’ : std of null beta-MPD_q ‘p’ : (count(null < obs) + 0.5 * ties) / iterations ‘ses’ : (null_mean - obs) / null_std
- Return type:
dict of pandas.DataFrame (S x S)
Notes
Returns a dataframe with observed beta_MPDq if iterations=0, otherwise a dictionary is returned
A p value close to zero means that the observed MPD between samples is lower than the null expectation
A p value close to one means that the observed MPD between samples is higher than the null expectation
A positive ses means that the observed MPD between samples is lower than the null expectation
A negative ses means that the observed MPD between samples is higher than the null expectation
- qdiv.model.beta_ntiq(obj, distmat, *, q=1.0, iterations=999, include_conspecifics=False, randomization='features', use_tqdm=True, random_state=None, **kwargs)[source]
Computes beta-MNTD_q (mean nearest-taxon distance with q-weighted abundances) for all sample pairs, then contrasts the observed matrix against a null distribution generated by randomization:
randomization=”features”: permute feature identities (rows) identically across samples
randomization=”abundances”: shuffle abundances within each sample (column-wise)
The null distribution is aggregated online using Welford updates, yielding per-pair null mean, null std, tie-aware p-index, and standardized effect size.
- Parameters:
obj (MicrobiomeData | dict | Any) – Input with at least an abundance table under key ‘tab’.
distmat (pandas.DataFrame) – Square distance matrix (features × features) whose index/columns include tab.index.
q (float, default=1.0) – Diversity order used to weight relative abundances (applied only to strictly positive entries).
iterations (int, default=999) – Number of randomization iterations used to build the null distribution.
include_conspecifics (bool, default=False) – Determines whether conspecifics (identical features shared between samples) are allowed to contribute zero-distance matches in the nearest-taxon calculation.
randomization ({"features", "abundances"}, default="features") –
- Randomization strategy for the null model:
”features”: permute feature identities identically for all samples (tip-label permutation).
”abundances”: shuffle abundances within each sample (column-wise permutation).
use_tqdm (bool, default=True) – Use tqdm for progress bars (a lightweight stub is used if tqdm is unavailable).
random_state (int | numpy.random.Generator, optional) – Random seed or Generator for reproducibility.
- Returns:
- Full (samples × samples) matrices:
’beta_MNTDq’ : observed beta-MNTD_q
’null_mean’ : mean of null beta-MNTD_q
’null_std’ : std of null beta-MNTD_q
’p’ : (count(null < observed) + 0.5 * ties) / iterations
’ses’ : (null_mean - observed) / null_std
Diagonal entries are set to NaN.
- Return type:
dict of pandas.DataFrame
Notes
Returns a dataframe with observed beta_MNTDq if iterations=0, otherwise a dictionary is returned
A p value close to zero means that the observed MNTD between samples is lower than the null expectation
A p value close to one means that the observed MNTD between samples is higher than the null expectation
A positive ses means that the observed MNTD between samples is lower than the null expectation
A negative ses means that the observed MNTD between samples is higher than the null expectation
References
Webb et al. (2002) American Naturalist. Stegen et al. (2013) ISME Journal.
- qdiv.model.simulate_community(size=100, communities=1, *, sigma=1.0, mean=0.0, c_prefix='Comm', species_prefix='OTU', sort=True, random_state=None)[source]
Simulate communities with log-normal species abundance distributions.
- Parameters:
size (int, default=100) – Number of species in each community.
communities (int, default=1) – Number of communities to simulate.
sigma (float, default=1.0) – Standard deviation of the log-normal distribution. Low values yield more even communities; high values yield more dominance.
mean (float, default=0.0) – Mean of the log-normal distribution.
c_prefix (str, default='Comm') – Prefix for community (column) names.
species_prefix (str, default='OTU') – Prefix for species (row) names.
sort (bool, default=True) – If True, sort species from most to least abundant.
random_state (int, np.random.Generator, or None) – Random seed or generator for reproducibility.
- Returns:
Simulated abundance table (species x communities). Rows = species, columns = communities.
- Return type:
pandas.DataFrame
- qdiv.model.community_sample(community, n=10000, random_state=None)[source]
Draw a multinomial sample from a community abundance distribution.
- Parameters:
community (pd.DataFrame) – DataFrame of species abundances (species x communities). Each column is a community; values are relative or absolute abundances.
n (int, default=10000) – Number of individuals to sample per community.
random_state (int, np.random.Generator, or None) – Random seed or generator for reproducibility.
- Returns:
DataFrame of sampled counts (species x communities). Returns None if input is not a DataFrame.
- Return type:
pandas.DataFrame or None
- qdiv.model.simulate_assembly(community, immigrants, fitness, selection=1.0, dispersal=1.0, max_iter=1000, tol=0.001, noise_level=0.0, interdependence=None, random_state=None, verbose=False)[source]
Simulate community assembly from local and immigrant pools, with optional stochastic noise and species interdependence (competition/facilitation).
- Parameters:
community (pd.DataFrame) – Initial community abundance table (species x communities).
immigrants (pd.DataFrame) – Immigrant pool abundance table (same shape as community).
fitness (pd.DataFrame) – Fitness values for each species in each community (same shape).
selection (float, default=1.0) – Relative weight of selection (fitness) in assembly.
dispersal (float, default=1.0) – Relative weight of dispersal (immigration) in assembly.
max_iter (int, default=1000) – Maximum number of assembly iterations.
tol (float, default=1e-3) – Convergence tolerance (sum of squared changes).
noise_level (float, default=0.0) – Standard deviation of Gaussian noise added to abundance updates (as a fraction of abundance).
interdependence (pd.DataFrame or None) – Square matrix (species x species) of interaction coefficients. Positive values = facilitation, negative = competition. Diagonal is typically zero or negative (self-limitation).
random_state (int, np.random.Generator, or None) – Random seed or generator for reproducibility.
verbose (bool, default=False) – If True, print progress at each iteration.
- Returns:
final_community (pandas.DataFrame) – Assembled community abundance table (species x communities).
n_iter (int) – Number of iterations performed.
converged (bool) – True if convergence was reached within max_iter, False otherwise.
- Return type:
Tuple[DataFrame, int, bool]
- qdiv.model.generate_interdependence_matrix(n_species, interaction_strength=1.0, positive_fraction=0.5, symmetric=False, diagonal=0.0, species_prefix='OTU', random_state=None)[source]
Generate a random species interdependence (interaction) matrix.
- Parameters:
n_species (int) – Number of species (matrix will be n_species x n_species).
interaction_strength (float, default=1.0) – Maximum absolute value for interaction coefficients.
positive_fraction (float, default=0.5) – Fraction of off-diagonal interactions that are positive (facilitative). The rest will be negative (competitive).
symmetric (bool, default=False) – If True, matrix will be symmetric (A_ij = A_ji).
diagonal (float, default=0.0) – Value to set on the diagonal (e.g., 0 for no self-interaction, -1 for self-limitation).
species_prefix (str, default='OTU') – Prefix for species (row) names.
random_state (int, np.random.Generator, or None) – Random seed or generator for reproducibility.
- Returns:
Interaction matrix (species x species).
- Return type:
pandas.DataFrame
- qdiv.model.make_block_tree_df(k_per_level, *, branch_length=1.0, root_name='Root', leaf_prefix='OTU', internal_prefix='in')[source]
Generate a block tree where branching factor varies with depth. Compatible with phylo_utils (nodes, parent, branchL, leaves, dist_to_root).
- Parameters:
k_per_level (sequence of int) – k_per_level[level] = number of children created at this depth. Length of k_per_level = total depth.
branch_length (float | Sequence[float] | Callable[[int, str, int], float]) – float → same length everywhere sequence[len=depth] → branch_length[level] callable(level, parent_name, child_index) → full control
root_name (str)
leaf_prefix (str)
internal_prefix (str)
- Return type:
DataFrame
- qdiv.model.make_beta_splitting_tree_df(n_leaves, beta, *, branch_length='ultrametric', root_name='Root', leaf_prefix='OTU', internal_prefix='in', random_state=None)[source]
Aldous β-splitting binary tree, returned as a DataFrame compatible with your phylo utils.
- Columns:
nodes (str), leaves (set[str]), branchL (float), parent (str|None), dist_to_root (float)
- Parameters:
n_leaves (int) – Number of tips (>= 1).
beta (float) – β parameter; requires beta > -1 so Beta(β+1, β+1) is defined. Larger β → more balanced; β -> -1+ → more comb-like.
branch_length (float | Sequence[float] | Callable[[int, str, int], float] | str) – “ultrametric” (default) or: - float: fixed length for all edges. - sequence[len = max_level+1] indexed by parent level (0=root). - callable(level, parent_name, child_index)->float for full control.
root_name (str)
leaf_prefix (str)
internal_prefix (str)
random_state (int | None)
- Return type:
DataFrame