qdiv.plot.rarefactioncurve

qdiv.plot.rarefactioncurve(obj, distmat=None, *, step='flexible', div_type='naive', q=0.0, figsize=(14, 10), fontsize=18, color_by=None, order=None, tag=None, colorlist=None, only_return_data=False, only_plot_data=None, savename=None)[source]

Calculate and plot rarefaction curves for alpha diversity (Hill numbers).

The function subsamples (without replacement) individual reads within each sample to compute the rarefaction curve for a chosen diversity type, then plots per-sample curves. If only_return_data=True, it returns the computed curves instead of plotting them. You can also supply precomputed curves via only_plot_data to plot without recomputation.

Parameters:
  • obj (dict or MicrobiomeData) –

    Input data containing at least:
    • ’tab’: pandas.DataFrame

      Abundance table (features x samples).

    • meta (pd.DataFrame): metadata with sample IDs as index matching tab columns.

    Optional keys depending on div_type: - tree: phylogenetic tree object (required if div_type='phyl').

  • distmat (str or pandas.DataFrame or None, optional) – Distance matrix required when div_type='func'. Can be a preloaded DataFrame or a path-like string handled by your func_alpha implementation.

  • step ({'flexible'} or int, default='flexible') – Subsampling step size (depth increments). If ‘flexible’, the total reads of each sample are divided by 20 (min 1). If an integer, it must be a positive step size in reads.

  • div_type ({'naive', 'phyl', 'func'}, default='naive') – Diversity measure to compute: - ‘naive’ : taxonomic (plain) diversity via naive_alpha. - ‘phyl’ : phylogenetic diversity via phyl_alpha (requires tree). - ‘func’ : functional diversity via func_alpha (requires distmat).

  • q (float, default=0.0) – Order of diversity (Hill number).

  • figsize (tuple of float, default=(14, 10)) – Figure size (width, height) in inches.

  • fontsize (int, default=18) – Base font size for the plot.

  • color_by (str, optional) – Metadata column in used to color-code lines (group legend).

  • order (str, optional) – Metadata column in used to order samples along the legend or visual grouping in the plot.

  • tag ({'index'} or str, optional) – If ‘index’, annotate curve endpoints with sample IDs. If a metadata column name, annotate with that column’s values.

  • colorlist (list of str, optional) – Colors used for plotting. If not provided, colors are drawn from get_colors_markers('colors'). Ensure the list is long enough for all groups/samples.

  • only_return_data (bool, default=False) – If True, return the computed data dictionary and do not plot.

  • only_plot_data (dict, optional) – Precomputed data dictionary to plot (skips computation). The format is: {sample_id: (xvals: np.ndarray, yvals: np.ndarray)}.

  • savename (str, optional) – If provided, save the plot to savename and also to a PDF file savename + '.pdf' (unless savename already ends with .pdf).

Returns:

Returns a dictionary with the keys ‘meta’, which holds the metadata dataframe and ‘samples’, which is another dictionary mapping sample IDs to (x, y) arrays for the rarefaction curves.

Return type:

dict

Notes

  • The function shuffles individual reads per sample using numpy.random.shuffle. For reproducibility, set the global NumPy random seed before calling.

  • Helper functions naive_alpha, phyl_alpha, and func_alpha are assumed to be available in the current namespace.

  • The count table obj['tab'] must contain non-negative integers; zero-count features are ignored per sample during accumulation.

Examples

Compute and plot, coloring by a metadata column:

>>> data = rarefactioncurve(
...     obj,
...     step='flexible',
...     div_type='naive',
...     q=0,
...     color_by='Treatment',
...     savename='rarefaction.png'
... )
>>> rd = rarefactioncurve(obj, step=500, only_return_data=True)

Plot from precomputed data:

>>> _ = rarefactioncurve(obj, only_plot_data=rd)  # uses obj['meta'] for annotations