Plotting

Heatmap

plot.heatmap(obj, xAxis='None', levels=['Phylum', 'Genus'], levelsShown='None', subsetLevels='None', subsetPatterns='None', order='None', numberToPlot=20, method='max_sample', merge=False, figsize=(14, 10), fontsize=15, sepCol = [], labels=True, labelsize=10, cThreshold=8, cMap='Reds', cLinear=0.5, cBar=[], savename='None')

Plots a heatmap showing the relative abundance of different taxa in different samples.

obj is the object.

xAxis specifies heading in meta data used to merge samples and use a labels on x-axis; levels specifies the taxonomic levels to be displayed on the y-axis.

levels specifies the taxonomic levels shown on the y-axis. It should be list with one or two taxonomic levels.

if levelsShown =’Number’, there will be a list of numbers on the y-axis instead of taxonomic names.

subsetLevels and subsetPatterns refers to text patterns which can be used to filter the results for specific taxa. If these arguments are used, the inputs should be a list of taxanomic levels to search within and a list of text patterns to search for (see also subset.text_patterns() function).

order refers to heading of meta data column used to order samples. The column should contain numbers. The sample with the smallest number will be placed to the left in the heatmap.

numberToPlot refers to the number of taxa with highest abundance to include in the heatmap.

method refers to the method used to define the taxa with highest abundance:

  • ‘max_sample’ uses the maximum relative abundance in a sample
  • ‘mean_all’ uses the mean relative abundance across all samples to rank the taxa

if merge =True, unclassified sequences will be merged. If False (default), they will be kept separate and identified with sequence id found in the count table.

figsize is the width and height of the figure in inches

fontsize refers to the axis text font size

sepCol is a list of column numbers between which to include a separator, i.e. to clarify grouping of samples

labels =True means that relative abundance values are shown in the plot.

labelsize is the font size of the relative abundance labels.

cThreshold is the percentage relative abundance threshold at which the label color switches from black to white (for clarity).

cMap is the color map used in the heatmap.

cLinear is a parameter determining how the color intensity changes with relative abundance. A value of 1 means the change is linear.

cBar is a list of tick marks to use if a color bar is included as legend. The tick marks indicate relative abundance percentage levels to include in the legend.

savename is the name (also include path) of the saved png file. If ‘None’, no figure is saved.

Alpha diversity

plot.alpha_diversity(obj, distmat='None', var='None', slist='None', order='None', ylog=False, figsize=(10, 6), fontsize=18, colorlist='None', savename='None')

Visualizes how alpha diversity depends on diversity order.

obj is the object.

If distmat is specified, the diversity.phyl_alpha function is used, otherwise diversity.naive_alpha is used. A distmat dataframe can be generated with the diversity.sequence_comparison() function.

var refers to column heading in meta data used to color code samples; slist is a list of samples from the var column to include (default is to include all).

order refers to column heading in meta data used to order the samples.

If ylog =True, the y-axis of the plot will be logarithmic.

figsize is the width and height of the figure in inches

fontsize refers to the axis text font size

colorlist is a list of colors used for the lines in the plot, if colorlist=’None’, qdiv will decide the colors.

savename is the name (also include path) of the saved png file, if ‘None’ no figure is saved.

PCoA

plot.pcoa(dist, meta='None', var1='None', var2='None', var1_title='', var2_title='', biplot=[], arrow_width=0.001, whitePad=1.1, var2pos=0.4, tag='None', order='None', title='', connectPoints='None', figsize=(9, 6), fontsize=12, markersize=50, markerscale=1.1, lw=1, hideAxisValues=False, showLegend=True, ellipse='None', n_std=2, ellipse_tag='None', ellipse_connect='None', flipx=False, flipy=False, returnData=False, colorlist='None', markerlist='None', savename='None')

Visualizes dissimilarities between samples in principal coordinate analysis plot.

dist is a distance matrix with pairwise dissimilarities. This can be generated using the beta diversity functions, i.e. naive_beta, phyl_beta, jaccard, bray.

meta is the meta data, typically object[‘meta’].

var1 is a column heading in the meta used to color code.

var2 is a column heading in the meta used to code by marker type.

var1_title and var_2 title are the titles used in the legend.

biplot is a list of columns in meta data containing numeric data that should be plotted as a biplot in the PCoA. The arrows associated with each column are scaled to the plot area of the PCoA.

arrow_width is the width of the arrows used in the biplot.

whitePad sets the space between the outermost points and the plot limits (1.0=no space).

var2pos is the vertical position of the var2 legend.

tag is meta data column used to add labels to each point in figure

order is meta data column used to order samples (should be numbers)

title is the title of the entire figure.

connectPoints is a meta data column with numbers. If specified, the sample points in the PCoA will be connected by lines in the order determined by the numbers in the column.

figsize is the figure size in inches (width, height).

markersize is the size of the markers in the figure.

markerscale sets the size of the markers in the legend

lw is linewidth of lines in the plot

if hideAxisValues=True no numbers are shown on the axes

if showLegend=False the legend is removed

ellipse is metadata column with categories of samples that should be grouped with confidence ellipses

n_std is the number of standard deviations of the confidence ellipses

ellipse_tag is metadata column with labels for each ellipse

ellipse_connect is metadata column with numbers used to connect centers of ellipses with lines.

if flipx =True, the x-axis will be inverted

if flipy =True, the y-axis will be inverted

if returnData =True, the coordinate of the points will be returned as a pandas dataframe. Note that if it is a biplot, two dataframes will be returned: one for the points and one for the arrows.

colorlist specifies colorlist to use for var1. If ‘None’, qdiv will decide the colors. same for markerlist and var2; savename is path and name to save png figure output.

markerlist specifies markers to use for var2. If ‘None’, qdiv will decide the markers.

savename is path and name to save png and pdf figures.

Pairwise dissimilarity

plot.pairwise_beta(obj, divType='naive', distmat='None', compareVar='None', spairs=[], nullModel=True, randomization='abundance', weight=0, iterations=10, qrange=[0, 2, 0.5], colorlist='None', onlyPlotData='None', skipJB=False, onlyReturnData=False, savename='None')

Calculate and/or plots dissimilarity between pairs of samples or sample types.

obj is the object.

divType is the type of beta diversity to calculate: ‘naive’, ‘phyl’, or ‘func’.

distmat is distance matrix file that must be provided if divType=’func’. A distmat dataframe can be generated with the stats.sequence_comparison() function.

compareVar is a column heading in the meta data. If compareVar is not None, the dissimilarity values represent all pairwise comparisons between the meta data categories specified present under compareVar.

spairs is a list of pairs to compare, each item in the list is another list of two samples names or categories to compare, e.g. [[sample_group_1, sample_group_2],[sample_group_X, sample_group_Y],[sample_group_3, sample_group_4]].

if nullModel =True, the null.rcq function will be run. randomization, weight, and iterations are all input to the null.rcq function (see documentation there).

qrange is a list containing the min, max, tick mark space on the diversity order x-axis of the figure.

colorlist is a list of colors used for the lines, if colorlist =’None’, qdiv will decide the colors.

If onlyPlotData is a dictionary containing data, the function will only plot the data in that dictionary and no further calculations with be carried out.

If skipJB =True, Jaccard and Bray-Curtis dissimilarities will not be calculated.

If onlyReturnData =True, no plots will be done and only a python dictionary containing the output data will be generated. This dictionary can later be used as input to the onlyPlotData argument.

savename is path and name to the generated output. The data in the python dictionary is saved as a pickle file.

Rarefaction curve

plot.rarefactioncurve(obj, step='flexible', figsize=(14, 10), fontsize=18, var='None', order='None', tag='None', colorlist='None', onlyReturnData=False, onlyPlotData='None', savename='None')

Calculates a rarefaction curve based on subsampling without replacement.

obj is the object.

step is the step size used during subsampling, if ‘flexible’ the total reads are divided by 20.

figsize is width and height of the figure in inches.

fontsize is size of text in figure.

var is the column in the meta data used to color code lines in plot.

order is the column in the meta data used to order the samples.

tag is the column in the meta data used to name lines in plot, if tag=’index’, the sample names are used.

colorlist is list of colors to be used in the plot, if ‘None’ qdiv default is used.

if onlyReturnData =True, function will return a python dictionary with data.

if onlyPlotData is a dictionary with data (generated in a previous step by running the function with onlyReturnData=True), it will be plotted and no calculations will be carried out.

if savename is specified, plots will be saved and data will be saved as a pickle file.

Octave (frequency histogram)

plot.octave(obj, var='None', slist='None', nrows=2, ncols=2, fontsize=11, figsize=(10, 6), xlabels=True, ylabels=True, title=True, color='blue', savename='None')

Octave plot according to Edgar and Flyvbjerg, DOI:10.1101/38983

obj is the qdiv object.

var is the column heading in metadata used to select samples to include. The counts for all samples with the same text in var column will be merged.

slist is a list of names in meta data column which specify samples to keep. If slist=’None’ (default), the whole meta data column is used.

nrows and ncols are the number of rows and columns in the plot; nrows*ncols must be equal to or more than the number of samples plotted.

if xlabels =True, k is shown for the bins on the x-axis

if ylabels =True, ASV counts are shown on the y-axis

if title =True, sample name is shown as title for each panel

color determines color of bars

savename is path and name of file.

Dissimilarity contributions of taxa

plot.dissimilarity_contributions(obj, var='None', q=1, index='local', numberToPlot=20, levels=['Genus'], fromFile='None', figsize=(18/2.54, 14/2.54), fontsize=10, savename='None')

Plot showing contribution of each taxon to observed dissimilarity between multiple samples.

obj is the qdiv object.

var is the column heading in the meta data used to categorize the samples. If a category has two or more samples, dissimilarity samples within that category is calculated.

q is the diversity order.

index is the type of dissimilarity index (either local or regional).

numberToPlot is the number of taxa to include.

levels are taxonomic levels to include on y-axis.

fromFile could be that path to a csv file generated with the output from diversity.naive_dissimilarity_contributions.

savename is path and name of files to be saved.