qdiv.sequences.sequence_distance_matrix
- qdiv.sequences.sequence_distance_matrix(obj, *, savename='SeqDistMat', path='', band_width=12, save=True, use_numba=True)[source]
Compute pairwise Levenshtein distances with a parallelized, Numba-accelerated banded Wagner–Fischer algorithm (if use_numba=True), else pure Python fallback.
- Parameters:
obj (MicrobiomeData or dict) – Must provide a DataFrame in obj.seq or obj[‘seq’] with index=sequence IDs and a column containing the sequences (default name: ‘seq’).
savename (str, optional) – Base filename for CSV outputs. Default ‘SeqDistMat’.
path (str, default "") – Directory path (absolute or relative) where output is saved. Can be “” for CWD.
band_width (int, optional) – Sakoe–Chiba band half-width (expanded automatically to |len1-len2|). Larger values increase accuracy (approach exact DP) but reduce speed. Default 12.
save (bool, optional) – If True, writes two CSVs: edits and normalized.
use_numba (bool, optional) – If True, uses Numba path; otherwise uses pure Python implementation.
- Returns:
- {
‘edits’: pd.DataFrame, # int distances ‘normalized’: pd.DataFrame, # float in [0, 1] ‘meta’: {‘backend’: ‘numba’|’python’}
}
- Return type:
Dict[str, Any]