qdiv.stats.distance_tests module

Distance tests: Mantel, Permanova, Gower

Public API:

mantel
permanova
gower

qdiv.stats.distance_tests.mantel(dis1, dis2, method='spearman', getOnlyStat=False, permutations=999, *, random_state=None, **kwargs)[source]

Perform a Mantel test to assess the association between two dissimilarity matrices.

The Mantel test evaluates whether pairs of samples that are close (or far apart) in one dissimilarity matrix tend to be close (or far apart) in another. The test statistic is computed by comparing the lower‑triangular entries of the two matrices, and statistical significance is assessed using a permutation test.

For correlation-based methods, the association is quantified as a dissimilarity (1 − r), where r is the Pearson or Spearman correlation between the vectorized distance matrices.

Parameters:

dis1 (pandas.DataFrame) – First square distance or dissimilarity matrix (samples × samples) with identical row and column labels.
dis2 (pandas.DataFrame) – Second square distance or dissimilarity matrix (samples × samples) with identical row and column labels matching dis1.
method ({'spearman', 'pearson', 'absDist'}, default='spearman') –
Measure used to quantify association between distance matrices:
- ’spearman’ :
  Spearman rank correlation between distances (reported as 1 − ρ).
- ’pearson’ :
  Pearson correlation between distances (reported as 1 − r).
- ’absDist’ :
  Mean absolute difference between corresponding distances.
getOnlyStat (bool, default=False) – If True, return only the observed test statistic without performing permutations.
permutations (int, default=999) – Number of permutations used to approximate the null distribution.
random_state (int | numpy.random.Generator | None) – Random seed or NumPy random generator for reproducible permutations.

Returns:

If getOnlyStat=True, returns the observed statistic only.
Otherwise, returns a list containing the observed statistic and its permutation-based p-value.

Return type:

float or list [statistic, p_value]

Notes

The test uses only the lower triangular part of each distance matrix (excluding the diagonal), avoiding double counting of pairwise distances.
Sample labels are permuted in dis1 while dis2 is held fixed to generate the null distribution.
For correlation-based methods (‘pearson’, ‘spearman’), the reported statistic is a dissimilarity (1 − r or 1 − ρ), so smaller values indicate stronger association between the two matrices.
p-values are computed using a standard permutation test with a +1 correction: (count + 1) / (permutations + 1).

qdiv.stats.distance_tests.permanova(dis, meta, by, *, permutations=999, include_interaction=False, strata=None, random_state=None, perm_scheme='freedman-lane', **kwargs)[source]

PERMANOVA (Anderson, 2001) implemented via projection matrices on the Gower‑centered distance matrix.

This function fits a distance‑based linear model with one or two categorical factors (optionally including their interaction) and tests each term using permutation-based pseudo‑F statistics. Tests are marginal (partial): each term is evaluated conditional on all other included terms.

Permutation inference can be performed either by permuting sample labels or by permuting residuals from reduced models (Freedman–Lane scheme), with optional restriction of permutations within exchangeability blocks (strata).

Parameters:

dis ((n x n) pandas.DataFrame) – Symmetric distance or dissimilarity matrix with identical row and column labels. Rows/columns correspond to samples.
meta (pandas.DataFrame | dict | MicrobiomeData-like) – Sample metadata indexed by sample IDs matching dis.index.
by (str or list[str]) – One or two column names in meta defining the categorical factor(s).
permutations (int, default 999) – Number of permutations used to approximate the null distribution.
include_interaction (bool, default False) – If by contains two factors and both have more than one level, include and test their interaction term.
strata (str | list[str] | None) – Column name(s) in meta defining exchangeability blocks. When given, permutations are restricted to occur within each stratum only (i.e. blocked permutations).
random_state (int | numpy.random.Generator | None) – Random seed or generator for reproducible permutations.
perm_scheme ({'labels', 'freedman-lane'}, default 'freedman-lane') –
Permutation scheme used to generate the null distribution:
- ’labels’:
  Classical label permutation. Sample labels (factor assignments) are permuted across samples while the distance matrix is kept fixed. When two factors are provided, their labels are permuted jointly, preserving observed factor combinations. Permutations may be restricted within strata if specified.
- ’freedman-lane’:
  Residual-based permutation (Freedman & Lane, 1983). For each tested term, residuals from the reduced model excluding that term are permuted (optionally within strata), added back to the fitted values of the reduced model, and the full model is refitted. This scheme yields valid partial tests in the presence of nuisance factors and allows testing main effects even when a factor is constant within strata.

Returns:

A dictionary with the following entries:

’by’:
List of tested term names (main effects and, if included, interaction).
’table’:
pandas.DataFrame with rows corresponding to model terms and the residual, and columns: [‘df’, ‘SS’, ‘MS’, ‘F’, ‘p’, ‘R2’].
’permutations’:
Number of permutations performed.
’strata’:
List of strata column names used for restricted permutations, or None.
’perm_scheme’:
The permutation scheme used (‘labels’ or ‘freedman-lane’).

Return type:

dict

Notes

The analysis follows the geometric partitioning of sums of squares described by Anderson (2001), using projection (hat) matrices on the Gower‑centered distance matrix.
P‑values are estimated from the permutation distribution using a standard +1 correction: (count + 1) / (permutations + 1).
If a tested factor does not vary within strata under label permutation, the corresponding null distribution may be degenerate and p‑values will be returned as NaN.

qdiv.stats.distance_tests.gower(meta=None, *, by=None, return_similarity=False)[source]

Compute the Gower distance matrix for a pandas DataFrame containing mixed variable types (numeric, categorical/boolean, datetime).

Parameters:

meta (pd.DataFrame, dict, or MicrobiomeData object) – Input data. Rows are samples; columns are variables.
by (Sequence[str] or str, optional) – Variable names (columns) to include. If None, all columns are included.
return_similarity (bool, optional) – If True, return Gower similarity (1 - distance). Default False (distance).

Returns:

Pairwise Gower distances (or similarities) between samples (rows).

Return type:

pandas.DataFrame

Notes

Numerical variables are scaled by their range (max - min). If the range is 0 (constant column), that variable contributes 0 for all pairs.
Datetime variables are converted to days (float) and treated as numeric.
Categorical/boolean variables contribute 0 when equal, 1 when different.
Missing values: a variable only contributes for row pairs where it is present in both rows; the per-pair denominator is the count of contributing variables for that pair.