qdiv.io

io

Provides functions for importing, saving, filtering and manipulating data.

qdiv.io.subset_samples(obj, *, by='index', values=None, exclude=False, keep_absent=False, inplace=False)[source]

Subset samples from a MicrobiomeData object or dictionary containing ‘meta’, ‘tab’, ‘seq’, ‘tax’, and optionally ‘tree’.

Parameters:

obj (MicrobiomeData or dict) – Either an instance of MicrobiomeData or a dictionary with keys: ‘tab’, ‘tax’, ‘seq’, ‘meta’, optionally ‘tree’.
by (str, default "index") – How to select samples: - “index”: match sample names in meta.index - or a column name in meta (e.g., “Treatment”)
values (list or scalar, optional) – Values to include (or exclude if exclude=True). If None and by != “index”, all unique values of meta[by] are used, exluding nan.
exclude (bool, default False) – If True, exclude samples that match values (inverse selection).
keep_absent (bool, default False) – If False, drop features (rows) with zero counts after subsetting.
inplace (bool, default False) – Only relevant when input is a MicrobiomeData object. If True, mutate and return the same object. If False, return a new object.

Returns:

Same type as input by default. - Object in ⇒ object out (mutated if inplace=True, otherwise a new instance). - Dict in ⇒ dict out.

Return type:

MicrobiomeData or dict

Raises:

ValueError – If values is not a list (or scalar convertible to list), or by is invalid when using metadata filtering.

Notes

Aligns self.tab.columns with self.meta.index after filtering.
Subsequent components (‘seq’, ‘tax’) are subset to remaining features.
‘tree’ is passed through unchanged.

qdiv.io.subset_features(obj, *, featurelist=None, exclude=False, inplace=False)[source]

Subset features (OTUs/ASVs/bins/MAGs) from a MicrobiomeData object or a dictionary containing ‘tab’, ‘tax’, ‘seq’, ‘tree’, and ‘meta’.

Parameters:

obj (MicrobiomeData or dict) – MicrobiomeData object or dictionary containing dataframes (tab, tax, seq, meta) and optionally a tree.
featurelist (list) – List of feature (OTU/ASV/bin) identifiers to keep or exclude.
exclude (bool, default False) – If True, exclude values in featurelist instead of including them.
inplace (bool, default False) – Only relevant when input is a MicrobiomeData object. If True, mutate and return the same object. If False, return a new object.

Returns:

Filtered object or dictionary with updated ‘tab’, ‘tax’, ‘seq’, ‘tree’, and ‘meta’.

Return type:

MicrobiomeData or dict

qdiv.io.subset_abundant(obj, *, n=25, method='mean', cutoff=None, exclude=False, inplace=False)[source]

Keep (or exclude) the most abundant features based on relative abundance or frequency of detection.

The abundance score for each feature is based on the ‘sum’ or ‘mean’ of its relative abundance across samples, or its ‘frequency’ of detection in the sample set. Ties are broken by feature index order for determinism.

Parameters:

obj (MicrobiomeData or dict) – MicrobiomeData object or a dictionary with keys: ‘tab’ (required), and optionally ‘tax’, ‘seq’, ‘tree’, ‘meta’. - ‘tab’: DataFrame [features x samples]. - ‘tax’: DataFrame [features x tax-levels]. - ‘seq’: DataFrame/Series [features]. - ‘tree’: kept unchanged. - ‘meta’: kept unchanged (sample metadata).
n (int, default 25) – Number of top features to keep (or exclude if exclude=True). Values outside [0, n_features] are clamped to the valid range.
method ({'sum','mean','frequency'}, default 'mean') – Reduction across samples of relative abundance per feature. - ‘sum’ : total relative abundance across samples - ‘mean’ : mean relative abundance across samples - ‘max’ : max relative abundance in a sample - ‘frequency’ : proportion of samples in which the feature is detected
cutoff (float, default None) – If cutoff is specific as a percentage (from 0 to 100%), all features with a ‘sum’, ‘mean’, or ‘max’ relative abundance or ‘frequency’ of detection above this value will be kept, and the parameter n will be ignored.
exclude (bool, default False) – If False (default), keep the top-n features. If True, exclude the top-n features (keep the rest).
inplace (bool, default False) – Only relevant for MicrobiomeData input. If True, mutate the object and return it; otherwise, return a new object.

Returns:

Same type as obj, with ‘tab’,’tax’,’seq’ filtered consistently. ‘tree’ and ‘meta’ are passed through unchanged.

Return type:

MicrobiomeData or dict

Notes

Cutoff values should be provided as percentages (0 to 100%).
The phylogenetic tree (‘tree’) is left unchanged. If you want to prune it, do so explicitly after calling this function.

qdiv.io.subset_taxa(obj, *, subset_levels=None, subset_patterns=None, exclude=False, case=False, regex=False, na=False, match_type='contains', inplace=False)[source]

Subset features (OTUs/ASVs/bins/MAGs) from a MicrobiomeData object or a dictionary based on taxonomic classification.

Parameters:

obj (MicrobiomeData or dict) – MicrobiomeData object or dictionary containing dataframes (tab, tax, seq, meta) and optionally a tree.
subset_levels (str or sequence of str, optional) – Taxonomic column(s) in which to search for patterns. If None, all columns in tax are used.
subset_patterns (str or sequence of str) – Text patterns to identify taxa to keep. If a single string is passed, it is used as the only pattern.
exclude (bool, default False) – If True, return taxa that do NOT match the given patterns (i.e., complement).
case (bool, default False) – If True, pattern matching is case-sensitive.
regex (bool, default False) – If True, patterns are treated as regex. If False, patterns are escaped (literal match).
na (bool, default False) – If True, na are treated as matches. If False, na are treated as non-matches. Empty or whitespace-only taxonomy entries are treated as missing (NA) during subsetting.
match_type ({'contains','fullmatch','startswith','endswith'}, default 'contains') – Matching behavior applied to the strings in selected columns.
inplace (bool, default False) – Only relevant when input is a MicrobiomeData object. If True, mutate and return the same object.

Returns:

Filtered object or dictionary with updated ‘tab’, ‘tax’, and ‘seq’. ‘meta’ and ‘tree’ are passed through.

Return type:

MicrobiomeData or dict

qdiv.io.merge_samples(obj, *, by, values=None, method='sum', weight=None, keep_absent=False, inplace=False)[source]

Merge samples based on metadata grouping.

Parameters:

obj (MicrobiomeData or dict) – Object or dictionary containing ‘tab’, ‘tax’, ‘seq’, ‘meta’, and optionally ‘tree’.
by (str or list) – Column(s) in metadata used for grouping samples.
values (list, optional) – Metadata values to keep. If None, all unique values in by are used.
method ({'sum', 'mean'}, default 'sum') – Aggregation method used when weight=None. If weight is provided, samples are merged using a weighted average based on the specified metadata column.
weight (str, optional, default None) – Name of a numeric metadata column used for weighted merging. Within each group, weights are normalized to sum to 1 and used to calculate a weighted average of feature abundances. If None, samples are merged using the specified method (‘sum’ or ‘mean’).
keep_absent (bool, default False) – If False, remove features with zero counts after merging.
inplace (bool, default False) – If True and obj is MicrobiomeData, mutate and return same object.

Returns:

Object with merged samples.

Return type:

MicrobiomeData or dict

qdiv.io.rarefy(obj, *, depth='min', random_state=None, replacement=False, inplace=False)[source]

Rarefy the abundance table in a MicrobiomeData object or dictionary.

Parameters:

obj (MicrobiomeData or dict) – Object containing at least ‘tab’ (features x samples).
depth (int or 'min') – Target sequencing depth per sample.
random_state (int | numpy.random.Generator, optional) – Random seed or Generator for reproducibility.
replacement (bool, default False) – Sample with replacement if True.
inplace (bool, default False) – If True and obj is MicrobiomeData, modify in place.

Returns:

Object with rarefied table and zero-count features removed.

Return type:

MicrobiomeData or dict