qdiv.sequences.consensus
- qdiv.sequences.consensus(objlist, *, keep_object='best', already_aligned=False, different_lengths=False, name_type='OTU', keep_cutoff=0.2, only_return_seq=False, return_type='auto')[source]
Build a consensus object based on features found in all input objects.
This function aligns features (e.g., ASVs/OTUs) across multiple microbiome data objects, identifies features shared by all, and constructs a consensus abundance table, sequences, taxonomy, and metadata. Optionally, features with high abundance in any object are retained even if not shared. The result can be returned as a dictionary or as a MicrobiomeData object.
- Parameters:
objlist (list of dict or MicrobiomeData) – List of input objects to merge. Each object must contain at least: - ‘tab’ : pd.DataFrame (abundance table, features x samples) - ‘seq’ : pd.DataFrame (sequences, indexed by feature IDs) Optionally: - ‘tax’ : pd.DataFrame (taxonomy annotations) - ‘meta’: pd.DataFrame (sample metadata) Objects can be either plain dicts or MicrobiomeData instances.
keep_object ({'best', int}, default 'best') – Determines which input object to use as the template for consensus: - ‘best’: the object with the largest fraction of reads mapped to shared features. - int: index of the object to use (0 = first, 1 = second, etc.).
already_aligned (bool, default False) – If True, assumes that features are already aligned across objects. If False, runs the alignment step.
different_lengths (bool, default False) – If True, allows alignment of features with different sequence lengths (substring matching).
name_type (str, default 'OTU') – Prefix for renaming consensus features (e.g., “OTU1”, “OTU2”, …).
keep_cutoff (float, default 0.2) – Relative abundance cutoff (%) for retaining features that are not shared by all objects, but are highly abundant in at least one object.
only_return_seq (bool, default False) – If True, only returns a DataFrame of shared sequences (plus the info dictionary). No consensus object is constructed.
return_type ({'auto', 'dict', 'microbiome'}, default 'auto') – Determines the type of object returned (unless only_return_seq is True): - ‘microbiome’: always return a MicrobiomeData object (except when only_return_seq=True). - ‘dict’: always return a dictionary (legacy behavior). - ‘auto’: return a MicrobiomeData object if any input was a MicrobiomeData; otherwise, return a dict.
- Returns:
cons_obj (dict or MicrobiomeData or pd.DataFrame) – The consensus object containing: - ‘tab’: abundance table (features x samples) - ‘seq’: sequence table (features x sequence) - ‘tax’: taxonomy table (optional) - ‘meta’: metadata table (optional) If return_type=’microbiome’ or ‘auto’ (with any MicrobiomeData input), returns a MicrobiomeData object. If return_type=’dict’, returns a dictionary. If only_return_seq=True, returns a DataFrame of shared sequences.
info (dict) – Dictionary with summary statistics about consensus construction, including: - ‘kept_object_index’: index of the selected template object - ‘all_objects’: per-object statistics (consensus abundance, lost reads/features) - ‘selected_object’: statistics for the selected object
- Return type:
Tuple[Dict[str, DataFrame] | MicrobiomeData | DataFrame, Dict[str, Any]]
Notes
The consensus object does not include a phylogenetic tree, even if present in the inputs.
Feature indices are re-ordered by average abundance and renamed using name_type.
If only_return_seq is True, only the shared sequences DataFrame and info are returned.
The function automatically aligns features unless already_aligned is True.
Examples
>>> cons_obj, info = consensus([obj1, obj2], keep_object='best') >>> print(type(cons_obj)) <class 'MicrobiomeData'> >>> cons_obj.info() >>> print(info)
>>> # To get a dict instead of MicrobiomeData: >>> cons_dict, info = consensus([obj1, obj2], return_type='dict')
>>> # To get only the shared sequences: >>> seq_df, info = consensus([obj1, obj2], only_return_seq=True)