qdiv.stats.ordination_calculations module
Ordination utilities: PCoA, db-RDA, and marginal (partial) permutation tests.
- Public API:
pcoa_lingoes
dbrda
summarize_dbrda
- qdiv.stats.ordination_calculations.pcoa_lingoes(dis)[source]
Perform Principal Coordinates Analysis (PCoA) using the Lingoes correction.
The Lingoes correction transforms a non‑Euclidean distance matrix into a Euclidean one by adding a constant to all squared distances, ensuring that all eigenvalues are non‑negative. PCoA is then performed on the corrected matrix to obtain principal coordinate axes.
- Parameters:
dis (pandas.DataFrame) – Square distance matrix (rows and columns represent samples). Values must be non‑negative and the matrix must be symmetric.
- Returns:
coords_df (pandas.DataFrame) – Principal coordinate scores (samples × axes), ordered by decreasing eigenvalue magnitude.
eigvals (pandas.Series) – Eigenvalues associated with each axis (only the positive eigenvalues after Lingoes correction).
pct_explained (pandas.Series) – Percentage of total variance explained by each axis (positive eigenvalues only).
total_variance (float) – Sum of all positive eigenvalues after correction.
- Return type:
DataFrame
Notes
The Lingoes correction is applied only if negative eigenvalues are detected.
The output coordinates are centered and scaled according to standard PCoA conventions.
- qdiv.stats.ordination_calculations.dbrda(dis=None, meta=None, *, by=None, condition=None, n_axes=2, scale='site', perm_n=999, perm_seed=42, pcoa_fn=<function pcoa_lingoes>, per_var_perm=False, interactions=None, drop_first=True)[source]
Distance‑based Redundancy Analysis (db‑RDA).
This function performs constrained ordination on a distance matrix by:
Converting the distance matrix into principal coordinates (PCoA) using the specified PCoA function (default: Lingoes correction).
Regressing the PCoA coordinates onto explanatory variables.
Extracting constrained axes, biplot scores, and variance components.
Performing a global permutation test (Freedman–Lane).
Optionally computing per‑variable permutation p‑values.
Optionally including categorical interaction terms.
- Parameters:
dis (pandas.DataFrame) – Square distance matrix (samples × samples). Must have matching row/column labels.
meta (pandas.DataFrame) – Metadata table containing explanatory variables (rows = samples).
by (str or list of str, optional) – Subset of metadata columns to use as explanatory variables. If None, all columns in meta are used.
condition (pandas.DataFrame, optional) – Conditioning variables for partial db‑RDA. Must align with meta.
n_axes (int, default=2) – Number of constrained axes to return.
scale ({'site', 'species'}, default='site') – Scaling for biplot scores.
perm_n (int, default=999) – Number of permutations for the global test.
perm_seed (int, default=42) – Random seed for reproducibility.
pcoa_fn (callable, default=pcoa_lingoes) – Function used to compute PCoA. Must return a dict with ‘site_scores’ and ‘eigenvalues’.
per_var_perm (bool, default=False) – If True, compute permutation p‑values for each predictor.
interactions (list of str, optional) – Variables for which interaction terms should be generated.
drop_first (bool, default=True) – Whether to drop the first dummy level when encoding categorical variables.
- Returns:
- {
‘site_scores’ : pandas.DataFrame, ‘biplot_scores’ : pandas.DataFrame, ‘variable_contributions’ : pandas.DataFrame, ‘eigenvalues’ : numpy.ndarray, ‘explained_ratio’ : numpy.ndarray, ‘total_inertia’ : float, ‘constrained_inertia’ : float, ‘unconstrained_inertia’ : float, ‘F_global’ : float, ‘p_global’ : float
}
- Return type:
dict
Notes
The global permutation test uses the Freedman–Lane procedure.
Partial db‑RDA is performed by residualizing both the response coordinates and the design matrix against the conditioning variables.
Interaction terms are constructed before dummy encoding.
- qdiv.stats.ordination_calculations.marginal_factor_tests_dbrda(dis, meta, *, by=None, condition=None, interactions=None, pcoa_fn=<function pcoa_lingoes>, perm_n=999, perm_seed=42, drop_first=True, return_F=True)[source]
Marginal (partial) permutation tests for db‑RDA factors.
Performs Freedman–Lane marginal tests for each factor block in the design matrix, controlling for all other terms. Also computes diagnostics based on the factor block alone (XG):
inertia_alone
pct_explained (alone)
p-alone (simple permutation test ignoring other terms)
- Parameters:
dis (pandas.DataFrame) – Square distance matrix (samples × samples).
meta (pandas.DataFrame) – Metadata table (rows = samples).
by (str or list of str, optional) – Subset of metadata columns to include as explanatory variables. If None, all columns in meta are used.
condition (pandas.DataFrame, optional) – Conditioning variables for partial db‑RDA.
interactions (list of str, optional) – Variables for which interaction terms should be generated.
pcoa_fn (callable, default=pcoa_lingoes) – Function returning {‘site_scores’, ‘eigenvalues’}.
perm_n (int, default=999) – Number of permutations.
perm_seed (int, default=42) – Random seed.
drop_first (bool, default=True) – Whether to drop the first dummy level when encoding categorical variables.
return_F (bool, default=True) – If True, compute F‑statistics; otherwise use raw delta inertia.
- Returns:
- Columns:
Factor df_added delta_inertia pct_explained (marginal) inertia_alone pct_explained (alone) F p-marginal p-alone
- Return type:
pandas.DataFrame
- qdiv.stats.ordination_calculations.summarize_dbrda(dis, meta, *, by=None, condition=None, interactions=None, pcoa_fn=<function pcoa_lingoes>, perm_n=999, perm_seed=42, drop_first=True, include_interpretation=True, include_alone=True)[source]
Summarize db‑RDA (global model + marginal factor tests).
- This function:
Runs dbRDA once (global model).
Runs marginal (partial) permutation tests per factor (Freedman–Lane).
Aggregates % explained by original factors (from the full model).
Computes R² and adjusted R².
Returns a tidy DataFrame, optionally with textual interpretation.
- Parameters:
dist (pandas.DataFrame) – Square distance matrix (rows/cols = samples). Index must match columns.
meta (pandas.DataFrame) – Metadata indexed by sample IDs.
by (str or list of str, optional) – Subset of metadata columns to use as explanatory variables. If None, all columns in meta are used.
condition (pandas.DataFrame, optional) – Covariates to partial out (same index as meta).
interactions (list of str, optional) – Variables for which interaction terms should be generated.
pcoa_fn (callable, default=pcoa_lingoes) – Function for the PCoA step; must return ‘site_scores’ and ‘eigenvalues’.
perm_n (int, default=999) – Number of permutations for marginal tests.
perm_seed (int, default=42) – Random seed for permutations.
drop_first (bool, default=True) – Drop first level in categorical encoding (reference coding).
include_interpretation (bool, default=True) – If True, adds a textual interpretation column.
include_alone (bool, default=True) – If True, keeps “alone” diagnostics (factor-alone %-explained, p-alone).
dis (DataFrame)
- Returns:
- Columns (by default):
Factor
pct_explained (full model)
df_added
delta_inertia
pct_explained (marginal)
F
p-marginal
inertia_alone
pct_explained (alone)
p-alone
Interpretation (optional)
- Attributes (df.attrs):
’R²’ : float
’Adjusted R²’ : float
’F_global’ : float
’p_global’ : float
’Total inertia’ : float
’Constrained inertia’ : float
’Unconstrained inertia’ : float
’n’ : int (samples)
’df_model’ : int (approx. number of fitted parameters)
- Return type:
pandas.DataFrame