qdiv.sequences.consensus

qdiv.sequences.consensus(objlist, *, keep_object='best', already_aligned=False, different_lengths=False, name_type='OTU', keep_cutoff=0.2, only_return_seq=False, return_type='auto')[source]

Build a consensus object based on features found in all input objects.

This function aligns features (e.g., ASVs/OTUs) across multiple microbiome data objects, identifies features shared by all, and constructs a consensus abundance table, sequences, taxonomy, and metadata. Optionally, features with high abundance in any object are retained even if not shared. The result can be returned as a dictionary or as a MicrobiomeData object.

Parameters:
  • objlist (list of dict or MicrobiomeData) – List of input objects to merge. Each object must contain at least: - ‘tab’ : pd.DataFrame (abundance table, features x samples) - ‘seq’ : pd.DataFrame (sequences, indexed by feature IDs) Optionally: - ‘tax’ : pd.DataFrame (taxonomy annotations) - ‘meta’: pd.DataFrame (sample metadata) Objects can be either plain dicts or MicrobiomeData instances.

  • keep_object ({'best', int}, default 'best') – Determines which input object to use as the template for consensus: - ‘best’: the object with the largest fraction of reads mapped to shared features. - int: index of the object to use (0 = first, 1 = second, etc.).

  • already_aligned (bool, default False) – If True, assumes that features are already aligned across objects. If False, runs the alignment step.

  • different_lengths (bool, default False) – If True, allows alignment of features with different sequence lengths (substring matching).

  • name_type (str, default 'OTU') – Prefix for renaming consensus features (e.g., “OTU1”, “OTU2”, …).

  • keep_cutoff (float, default 0.2) – Relative abundance cutoff (%) for retaining features that are not shared by all objects, but are highly abundant in at least one object.

  • only_return_seq (bool, default False) – If True, only returns a DataFrame of shared sequences (plus the info dictionary). No consensus object is constructed.

  • return_type ({'auto', 'dict', 'microbiome'}, default 'auto') – Determines the type of object returned (unless only_return_seq is True): - ‘microbiome’: always return a MicrobiomeData object (except when only_return_seq=True). - ‘dict’: always return a dictionary (legacy behavior). - ‘auto’: return a MicrobiomeData object if any input was a MicrobiomeData; otherwise, return a dict.

Returns:

  • cons_obj (dict or MicrobiomeData or pd.DataFrame) – The consensus object containing: - ‘tab’: abundance table (features x samples) - ‘seq’: sequence table (features x sequence) - ‘tax’: taxonomy table (optional) - ‘meta’: metadata table (optional) If return_type=’microbiome’ or ‘auto’ (with any MicrobiomeData input), returns a MicrobiomeData object. If return_type=’dict’, returns a dictionary. If only_return_seq=True, returns a DataFrame of shared sequences.

  • info (dict) – Dictionary with summary statistics about consensus construction, including: - ‘kept_object_index’: index of the selected template object - ‘all_objects’: per-object statistics (consensus abundance, lost reads/features) - ‘selected_object’: statistics for the selected object

Return type:

Tuple[Dict[str, DataFrame] | MicrobiomeData | DataFrame, Dict[str, Any]]

Notes

  • The consensus object does not include a phylogenetic tree, even if present in the inputs.

  • Feature indices are re-ordered by average abundance and renamed using name_type.

  • If only_return_seq is True, only the shared sequences DataFrame and info are returned.

  • The function automatically aligns features unless already_aligned is True.

Examples

>>> cons_obj, info = consensus([obj1, obj2], keep_object='best')
>>> print(type(cons_obj))
<class 'MicrobiomeData'>
>>> cons_obj.info()
>>> print(info)
>>> # To get a dict instead of MicrobiomeData:
>>> cons_dict, info = consensus([obj1, obj2], return_type='dict')
>>> # To get only the shared sequences:
>>> seq_df, info = consensus([obj1, obj2], only_return_seq=True)