qdiv.stats.permanova

qdiv.stats.permanova(dis, meta, by, *, permutations=999, include_interaction=False, strata=None, random_state=None, perm_scheme='freedman-lane', **kwargs)[source]

PERMANOVA (Anderson, 2001) implemented via projection matrices on the Gower‑centered distance matrix.

This function fits a distance‑based linear model with one or two categorical factors (optionally including their interaction) and tests each term using permutation-based pseudo‑F statistics. Tests are marginal (partial): each term is evaluated conditional on all other included terms.

Permutation inference can be performed either by permuting sample labels or by permuting residuals from reduced models (Freedman–Lane scheme), with optional restriction of permutations within exchangeability blocks (strata).

Parameters:
  • dis ((n x n) pandas.DataFrame) – Symmetric distance or dissimilarity matrix with identical row and column labels. Rows/columns correspond to samples.

  • meta (pandas.DataFrame | dict | MicrobiomeData-like) – Sample metadata indexed by sample IDs matching dis.index.

  • by (str or list[str]) – One or two column names in meta defining the categorical factor(s).

  • permutations (int, default 999) – Number of permutations used to approximate the null distribution.

  • include_interaction (bool, default False) – If by contains two factors and both have more than one level, include and test their interaction term.

  • strata (str | list[str] | None) – Column name(s) in meta defining exchangeability blocks. When given, permutations are restricted to occur within each stratum only (i.e. blocked permutations).

  • random_state (int | numpy.random.Generator | None) – Random seed or generator for reproducible permutations.

  • perm_scheme ({'labels', 'freedman-lane'}, default 'freedman-lane') –

    Permutation scheme used to generate the null distribution:

    • ’labels’:

      Classical label permutation. Sample labels (factor assignments) are permuted across samples while the distance matrix is kept fixed. When two factors are provided, their labels are permuted jointly, preserving observed factor combinations. Permutations may be restricted within strata if specified.

    • ’freedman-lane’:

      Residual-based permutation (Freedman & Lane, 1983). For each tested term, residuals from the reduced model excluding that term are permuted (optionally within strata), added back to the fitted values of the reduced model, and the full model is refitted. This scheme yields valid partial tests in the presence of nuisance factors and allows testing main effects even when a factor is constant within strata.

Returns:

A dictionary with the following entries:

  • ’by’:

    List of tested term names (main effects and, if included, interaction).

  • ’table’:

    pandas.DataFrame with rows corresponding to model terms and the residual, and columns: [‘df’, ‘SS’, ‘MS’, ‘F’, ‘p’, ‘R2’].

  • ’permutations’:

    Number of permutations performed.

  • ’strata’:

    List of strata column names used for restricted permutations, or None.

  • ’perm_scheme’:

    The permutation scheme used (‘labels’ or ‘freedman-lane’).

Return type:

dict

Notes

  • The analysis follows the geometric partitioning of sums of squares described by Anderson (2001), using projection (hat) matrices on the Gower‑centered distance matrix.

  • P‑values are estimated from the permutation distribution using a standard +1 correction: (count + 1) / (permutations + 1).

  • If a tested factor does not vary within strata under label permutation, the corresponding null distribution may be degenerate and p‑values will be returned as NaN.