Title: | Statistical Analysis to Identify Isotope Incorporating metagenomic features |
---|---|
Description: | Statistical analysis as part of a quantitative stable isotope probing (SIP) metagenomics study to identify isotope incorporating metagenomic features. Helpful reading and a vignette in bookdown format is provided on the package site <https://zielslab.github.io/SIPmg.github.io/>. |
Authors: | Pranav Sampara [aut, cre], Kate Waring [ctb], Ryan Ziels [aut], Alma Garcia Roche [aut] |
Maintainer: | Pranav Sampara <[email protected]> |
License: | GPL-2 |
Version: | 2.3 |
Built: | 2024-11-14 01:41:00 UTC |
Source: | https://github.com/zielslab/sipmg |
Data table generated from the "qSIP_atom_excess_MAGs" function
data(atomX)
data(atomX)
An object of class "list"
See Hungate et al., 2015 for more details
calc_atom_excess_MAGs(Mlab, Mlight, Mheavymax, isotope = "13C")
calc_atom_excess_MAGs(Mlab, Mlight, Mheavymax, isotope = "13C")
Mlab |
The molecular wight of labeled DNA |
Mlight |
The molecular wight of unlabeled DNA |
Mheavymax |
The theoretical maximum molecular weight of fully-labeled DNA |
isotope |
The isotope for which the DNA is labeled with ('13C', '18O', '15N') |
numeric value: atom fraction excess (A)
This script was adapted from https://github.com/buckleylab/HTSSIP/blob/master/R/qSIP_atom_excess.R for use with genome-centric metagenomics. See Hungate et al., 2015 for more details
calc_Mheavymax_MAGs(Mlight, isotope = "13C", Gi = Gi)
calc_Mheavymax_MAGs(Mlight, isotope = "13C", Gi = Gi)
Mlight |
The molecular wight of unlabeled DNA |
isotope |
The isotope for which the DNA is labeled with ('13C' or '18O' or '15N') |
Gi |
The G+C content of unlabeled DNA |
numeric value: maximum molecular weight of fully-labeled DNA
Normalize feature coverages to estimate absolute abundance or relative coverage using MAG/contig coverage values with or without multiplying total DNA concentration of the fraction
coverage_normalization( f_tibble, contig_coverage, sequencing_yield, fractions_df, approach = "relative_coverage" )
coverage_normalization( f_tibble, contig_coverage, sequencing_yield, fractions_df, approach = "relative_coverage" )
f_tibble |
Can be either of (1) a tibble with first column "Feature" that contains bin IDs, and the rest of the columns represent samples with bins' coverage values. (2) a tibble as outputted by the program "checkm coverage" from the tool CheckM. Please check CheckM documentation - https://github.com/Ecogenomics/CheckM on the usage for "checkm coverage" program |
contig_coverage |
tibble with contig ID names ("Feature" column), sample columns with same sample names as in f_tibble containing coverage values of each contig, contig length in bp ("contig_length" column), and the MAG the contig is associated ("MAG" column) with same MAGs as in Feature column of f_tibble dataset. |
sequencing_yield |
tibble containing sample ID ("sample" column) with same sample names as in f_tibble and number of reads in bp recovered in that sample ("yield" column). |
fractions_df |
fractions data frame A fractions file with the following columns
|
approach |
Please choose the method for coverage normalization as "relative_coverage", "greenlon", "starr" to estimate only relative coverage without multiplying DNA concentration of fraction, or as per methods in Greenlon et al. - https://journals.asm.org/doi/full/10.1128/msystems.00417-22 or Starr et al. - https://journals.asm.org/doi/10.1128/mSphere.00085-21 |
tibble containing normalized coverage in required format with MAG name as first column and the normalized coverage values in each sample as the rest of the columns.
data(f_tibble) rel.cov = coverage_normalization(f_tibble)
data(f_tibble) rel.cov = coverage_normalization(f_tibble)
The 'use_geo_mean' parameter uses geometric means on all non-zero abundances for estimateSizeFactors instead of using the default log-tranformed geometric means.
DESeq2_l2fc( physeq, density_min, density_max, design, l2fc_threshold = 0.25, sparsity_threshold = 0.25, sparsity_apply = "all", size_factors = "geoMean" )
DESeq2_l2fc( physeq, density_min, density_max, design, l2fc_threshold = 0.25, sparsity_threshold = 0.25, sparsity_apply = "all", size_factors = "geoMean" )
physeq |
Phyloseq object |
density_min |
Minimum buoyant density of the 'heavy' gradient fractions |
density_max |
Maximum buoyant density of the 'heavy' gradient fractions |
design |
|
l2fc_threshold |
log2 fold change (l2fc) values must be significantly above this threshold in order to reject the hypothesis of equal counts. |
sparsity_threshold |
All OTUs observed in less than this portion (fraction: 0-1) of gradient fraction samples are pruned. A a form of indepedent filtering, The sparsity cutoff with the most rejected hypotheses is used. |
sparsity_apply |
Apply sparsity threshold to all gradient fraction samples ('all') or just heavy fraction samples ('heavy') |
size_factors |
Method of estimating size factors. 'geoMean' is from (Pepe-Ranney et. al., 2016) and removes all zero-abundances from the calculation. 'default' is the default for estimateSizeFactors. 'iterate' is an alternative when every OTU has a zero in >=1 sample. |
dataframe of HRSIP results
data(phylo.qSIP) df_l2fc = DESeq2_l2fc(phylo.qSIP, density_min=1.71, density_max=1.75, design=~Isotope)
data(phylo.qSIP) df_l2fc = DESeq2_l2fc(phylo.qSIP, density_min=1.71, density_max=1.75, design=~Isotope)
Data table generated from bostrapping the AFE table using the "qSIP_bootstrap_fcr" function
data(df_atomX_boot)
data(df_atomX_boot)
An object of class "data.frame"
Coverage data used for many functions in the package
data(f_tibble)
data(f_tibble)
An object of class "data.frame"
filter_l2fc
filters a l2fc table to 'best' sparsity cutoffs &
bouyant density windows.
filter_l2fc(df_l2fc, padj_cutoff = 0.1)
filter_l2fc(df_l2fc, padj_cutoff = 0.1)
df_l2fc |
data.frame of log2 fold change values |
padj_cutoff |
Adjusted p-value cutoff for rejecting the null hypothesis that l2fc values were not greater than the l2fc_threshold. |
filtered df_l2fc object
This function enables removing NAs from the atomX table.
filter_na(atomX)
filter_na(atomX)
atomX |
A list object created by |
A list of 2 data.frame objects without MAGs which have NAs. 'W' contains the weighted mean buoyant density (W) values for each OTU in each treatment/control. 'A' contains the atom fraction excess values for each OTU. For the 'A' table, the 'Z' column is buoyant density shift, and the 'A' column is atom fraction excess.
data(atomX) ### Remove NAs in atomX table atomx_no_na = filter_na(atomX)
data(atomX) ### Remove NAs in atomX table atomx_no_na = filter_na(atomX)
Fractions data used for many functions in the package
data(fractions)
data(fractions)
An object of class "data.frame"
GC_content data
data(GC_content)
data(GC_content)
An object of class "data.frame"
Conduct (multi-window) high resolution stable isotope probing (HR-SIP) analysis.
HRSIP( physeq, design, density_windows = data.frame(density_min = c(1.7), density_max = c(1.75)), sparsity_threshold = seq(0, 0.3, 0.1), sparsity_apply = "all", l2fc_threshold = 0.25, padj_method = "BH", padj_cutoff = NULL, parallel = FALSE )
HRSIP( physeq, design, density_windows = data.frame(density_min = c(1.7), density_max = c(1.75)), sparsity_threshold = seq(0, 0.3, 0.1), sparsity_apply = "all", l2fc_threshold = 0.25, padj_method = "BH", padj_cutoff = NULL, parallel = FALSE )
physeq |
Phyloseq object |
design |
|
density_windows |
The buoyant density window(s) used for for calculating log2 fold change values. Input can be a vector (length 2) or a data.frame with a 'density_min' and a 'density_max' column (each row designates a density window). |
sparsity_threshold |
All OTUs observed in less than this portion (fraction: 0-1) of gradient fraction samples are pruned. This is a form of indepedent filtering. The sparsity cutoff with the most rejected hypotheses is used. |
sparsity_apply |
Apply sparsity threshold to all gradient fraction samples ('all')
or just 'heavy' fraction samples ('heavy'), where 'heavy' samples are designated
by the |
l2fc_threshold |
log2 fold change (l2fc) values must be significantly above this
threshold in order to reject the hypothesis of equal counts.
See |
padj_method |
Method for global p-value adjustment (See |
padj_cutoff |
Adjusted p-value cutoff for rejecting the null hypothesis
that l2fc values were not greater than the l2fc_threshold.
Set to |
parallel |
Process each parameter combination in parallel.
See |
The (MW-)HR-SIP workflow is as follows:
For each sparsity threshold & BD window: calculate log2 fold change values (with DESeq2) for each OTU
Globally adjust p-values with a user-defined method (see p.adjust())
Select the sparsity cutoff with the most rejected hypotheses
For each OTU, select the BD window with the greatest log2 fold change value
dataframe of HRSIP results
data(phylo.qSIP) ## HR-SIP ### Note: treatment-control samples differentiated with 'design=~Isotope' df_l2fc = HRSIP(phylo.qSIP, design=~Isotope) ## Same, but multiple BD windows (MW-HR-SIP). For parallel processing change to parallel = TRUE ### Windows = 1.7-1.74, 1.72-1.75, and 1.73 - 1.76 windows = data.frame(density_min=c(1.71,1.72, 1.73), density_max=c(1.74,1.75,1.76)) df_l2fc = HRSIP(phylo.qSIP, design=~Isotope, density_windows = windows, padj_cutoff = 0.05, parallel=FALSE)
data(phylo.qSIP) ## HR-SIP ### Note: treatment-control samples differentiated with 'design=~Isotope' df_l2fc = HRSIP(phylo.qSIP, design=~Isotope) ## Same, but multiple BD windows (MW-HR-SIP). For parallel processing change to parallel = TRUE ### Windows = 1.7-1.74, 1.72-1.75, and 1.73 - 1.76 windows = data.frame(density_min=c(1.71,1.72, 1.73), density_max=c(1.74,1.75,1.76)) df_l2fc = HRSIP(phylo.qSIP, design=~Isotope, density_windows = windows, padj_cutoff = 0.05, parallel=FALSE)
This function provides a table with MAGs and their corresponding GTDB taxonomy as an output. This would be useful in identifying the taxa that have incorporation
incorporators_taxonomy(taxonomy, bootstrapped_AFE_table)
incorporators_taxonomy(taxonomy, bootstrapped_AFE_table)
taxonomy |
A taxonomy tibble obtained in the markdown. This taxonomy tibble is typically a concatenated list of archaeal and bacterial taxonomy from GTDB-Tk Please check GTDB-Tk documentation for running the tool |
bootstrapped_AFE_table |
A data frame indicating bootstrapped atom fraction excess values |
A tibble with two columns, OTU and Taxonomy, with taxonomy of the incorporator MAGs
data(taxonomy_tibble,df_atomX_boot) ### Making incorporator taxonomy list incorporator_list = incorporators_taxonomy(taxonomy = taxonomy_tibble, bootstrapped_AFE_table = df_atomX_boot)
data(taxonomy_tibble,df_atomX_boot) ### Making incorporator taxonomy list incorporator_list = incorporators_taxonomy(taxonomy = taxonomy_tibble, bootstrapped_AFE_table = df_atomX_boot)
MAG abundances in the format of phyloseq object to be used in the qSIP and (MW-)HR-SIP pipeline
data(mag.table)
data(mag.table)
An object of class "phyloseq"
Master phyloseq object
data(phylo.qSIP)
data(phylo.qSIP)
An object of class "phyloseq"
Creates a phyloseq-style object using processed phyloseq objects for otu table (here, MAG table), taxa table, and sample table
phylo.table(mag, taxa, samples)
phylo.table(mag, taxa, samples)
mag |
phyloseq-styled MAG table |
taxa |
phyloseq-styled taxa table |
samples |
sample information table |
phyloseq object for MAGs
data(mag.table,taxonomy.object,samples.object,fractions,taxonomy_tibble) ###Making phyloseq table from fractions metadata samples.object = sample.table(fractions) taxonomy.object = tax.table(taxonomy_tibble) ### Making master phyloseq table from scaled MAG data, taxa and fractions phyloseq data phylo.qSIP = phylo.table(mag.table,taxonomy.object,samples.object)
data(mag.table,taxonomy.object,samples.object,fractions,taxonomy_tibble) ###Making phyloseq table from fractions metadata samples.object = sample.table(fractions) taxonomy.object = tax.table(taxonomy_tibble) ### Making master phyloseq table from scaled MAG data, taxa and fractions phyloseq data phylo.qSIP = phylo.table(mag.table,taxonomy.object,samples.object)
Reformat a phyloseq object of qSIP_atom_excess_MAGs analysis
qSIP_atom_excess_format_MAGs(physeq, control_expr, treatment_rep)
qSIP_atom_excess_format_MAGs(physeq, control_expr, treatment_rep)
physeq |
A phyloseq object |
control_expr |
An expression for identifying unlabeled control samples in the phyloseq object (eg., "Substrate=='12C-Con'") |
treatment_rep |
Which column in the phyloseq sample data designates replicate treatments |
numeric value: atom fraction excess (A)
Calculate atom fraction excess using q-SIP method
qSIP_atom_excess_MAGs( physeq, control_expr, treatment_rep = NULL, isotope = "13C", df_OTU_W = NULL, Gi )
qSIP_atom_excess_MAGs( physeq, control_expr, treatment_rep = NULL, isotope = "13C", df_OTU_W = NULL, Gi )
physeq |
A phyloseq object |
control_expr |
Expression used to identify control samples based on sample_data. |
treatment_rep |
Which column in the phyloseq sample data designates replicate treatments |
isotope |
The isotope for which the DNA is labeled with ('13C' or '18O' or '15N') |
df_OTU_W |
Keep NULL |
Gi |
GC content of the MAG |
A list of 2 data.frame objects. 'W' contains the weighted mean buoyant density (W) values for each OTU in each treatment/control. 'A' contains the atom fraction excess values for each OTU. For the 'A' table, the 'Z' column is buoyant density shift, and the 'A' column is atom fraction excess.
data(phylo.qSIP,GC_content) ### Making atomx table ## Not run:: ### BD shift (Z) & atom excess (A) atomX = qSIP_atom_excess_MAGs(phylo.qSIP, control_expr='Isotope=="12C"', treatment_rep='Replicate', Gi = GC_content)
data(phylo.qSIP,GC_content) ### Making atomx table ## Not run:: ### BD shift (Z) & atom excess (A) atomX = qSIP_atom_excess_MAGs(phylo.qSIP, control_expr='Isotope=="12C"', treatment_rep='Replicate', Gi = GC_content)
Calculate adjusted bootstrap CI after for multiple testing for atom fraction excess using q-SIP method. Multiple hypothesis tests are corrected by
qSIP_bootstrap_fcr( atomX, isotope, n_sample = c(3, 3), ci_adjust_method = "fcr", n_boot = 10, parallel = FALSE, a = 0.1, Gi, show_delbd_AFE = FALSE )
qSIP_bootstrap_fcr( atomX, isotope, n_sample = c(3, 3), ci_adjust_method = "fcr", n_boot = 10, parallel = FALSE, a = 0.1, Gi, show_delbd_AFE = FALSE )
atomX |
A list object created by |
isotope |
The isotope for which the DNA is labeled with ('13C', '15N' or '18O') |
n_sample |
A vector of length 2. The sample size for data resampling (with replacement) for 1) control samples and 2) treatment samples. |
ci_adjust_method |
Confidence interval adjustment method. Please choose 'FCR', 'Bonferroni', or 'none' (if no adjustment is needed). Default is FCR and also provides unadjusted CI. |
n_boot |
Number of bootstrap replicates. |
parallel |
Parallel processing. See |
a |
A numeric value. The alpha for calculating confidence intervals. |
Gi |
The G+C content of unlabeled DNA as a dataframe with "Feature" column having MAGs, contigs, or other features as rows, and a "Gi" column with GC content |
show_delbd_AFE |
Show AFE values and incorporator identification estimated based on the delta buoyant density estimates? |
A data.frame of atom fraction excess values (A) and atom fraction excess confidence intervals adjusted for multiple testing.
data(phylo.qSIP,GC_content) ### BD shift (Z) & atom excess (A) atomX = qSIP_atom_excess_MAGs(phylo.qSIP, control_expr='Isotope=="12C"', treatment_rep='Replicate', Gi = GC_content) ### Add doParallel::registerDoParallel(num_cores) if parallel bootstrapping is to be done df_atomX_boot = qSIP_bootstrap_fcr(atomX, isotope = "13C", Gi = GC_content, n_boot=5, parallel = FALSE)
data(phylo.qSIP,GC_content) ### BD shift (Z) & atom excess (A) atomX = qSIP_atom_excess_MAGs(phylo.qSIP, control_expr='Isotope=="12C"', treatment_rep='Replicate', Gi = GC_content) ### Add doParallel::registerDoParallel(num_cores) if parallel bootstrapping is to be done df_atomX_boot = qSIP_bootstrap_fcr(atomX, isotope = "13C", Gi = GC_content, n_boot=5, parallel = FALSE)
Creates a phyloseq-styled sample table from fractions metadata containing data on fraction number, number of replicates, buoyant density calculated from a refractometer, type of isotope, and DNA concentration of each fraction, and isotope type. See below for information on "fractions" file.
sample.table(fractions_df)
sample.table(fractions_df)
fractions_df |
fractions data frame A fractions file with the following columns
|
data frame: phyloseq-style sample table
data(fractions) ### Making phyloseq table from fractions metadata samples.object = sample.table(fractions)
data(fractions) ### Making phyloseq table from fractions metadata samples.object = sample.table(fractions)
Fractions metadata in the format of phyloseq object to be used in the qSIP and (MW-)HR-SIP pipeline
data(samples.object)
data(samples.object)
An object of class "phyloseq"
Calculates global scaling factors for features (contigs or bins),based on linear regression of sequin coverage. Options include log-transformations of coverage, as well as filtering features based on limit of detection. This function must be called first, before the feature abundance table, feature detection table, and plots are retrieved.
scale_features_lm( f_tibble, sequin_meta, seq_dilution, log_trans = TRUE, coe_of_variation = 250, lod_limit = 0, save_plots = TRUE, plot_dir = tempdir(), cook_filtering = TRUE )
scale_features_lm( f_tibble, sequin_meta, seq_dilution, log_trans = TRUE, coe_of_variation = 250, lod_limit = 0, save_plots = TRUE, plot_dir = tempdir(), cook_filtering = TRUE )
f_tibble |
Can be either of (1) a tibble with first column "Feature" that contains bin IDs, and the rest of the columns represent samples with bins' coverage values. (2) a tibble as outputted by the program "checkm coverage" from the tool CheckM. Please check CheckM documentation - https://github.com/Ecogenomics/CheckM on the usage for "checkm coverage" program |
sequin_meta |
tibble containing sequin names ("Feature" column) and concentrations in attamoles/uL ("Concentration" column). |
seq_dilution |
tibble with first column "Sample" with same sample names as in f_tibble, and a second column "Dilution" showing ratio of sequins added to final sample volume (e.g. a value of 0.01 for a dilution of 1 volume sequin to 99 volumes sample) |
log_trans |
Boolean (TRUE or FALSE), should coverages and sequin concentrations be log-scaled? |
coe_of_variation |
Acceptable coefficient of variation for coverage and detection (eg. 20 - for 20 % threshold of coefficient of variation). Coverages above the threshold value will be flagged in the plots. |
lod_limit |
(Decimal range 0-1) Threshold for the percentage of minimum detected sequins per concentration group. Default = 0 |
save_plots |
Boolean (TRUE or FALSE), should sequin scaling be saved? Default = TRUE |
plot_dir |
Directory where plots are to be saved. Will create a directory "sequin_scaling_plots_lm" if it does not exist. |
cook_filtering |
Boolean (TRUE or FALSE), should data points be filtered based on Cook's distance metric. Cooks distance can be useful in detecting influential outliers in an ordinary least square’s regression model, which can negatively influence the model. A threshold of Cooks distance of 4/n (where n is the sample size) is chosen, and any data point with Cooks distance > 4/n is filtered out. It is typical to choose 4/n as the threshold in detecting the outliers in the data. Default = TRUE |
a list of tibbles containing
mag_tab: a tibble with first column "Feature" that contains bin (or contig IDs), and the rest of the columns represent samples with features' scaled abundances (attamoles/uL)
mag_det: a tibble with first column "Feature" that contains bin (or contig IDs),
plots: linear regression plots for scaling MAG coverage values to absolute abundance
scale_fac: a master tibble with all of the intermediate values in above calculations
data(f_tibble, sequins, seq_dil) ### scaling sequins from coverage values scaled_features_lm = scale_features_lm(f_tibble,sequin_meta, seq_dil)
data(f_tibble, sequins, seq_dil) ### scaling sequins from coverage values scaled_features_lm = scale_features_lm(f_tibble,sequin_meta, seq_dil)
Calculates global scaling factors for features (contigs or bins),based on linear regression of sequin coverage. Options include log-transformations of coverage, as well as filtering features based on limit of detection. This function must be called first, before the feature abundance table, feature detection table, and plots are retrieved.
scale_features_rlm( f_tibble, sequin_meta, seq_dilution, log_trans = TRUE, coe_of_variation = 250, lod_limit = 0, save_plots = TRUE, plot_dir = tempdir() )
scale_features_rlm( f_tibble, sequin_meta, seq_dilution, log_trans = TRUE, coe_of_variation = 250, lod_limit = 0, save_plots = TRUE, plot_dir = tempdir() )
f_tibble |
Can be either of (1) a tibble with first column "Feature" that contains bin IDs, and the rest of the columns represent samples with bins' coverage values. (2) a tibble as outputted by the program "checkm coverage" from the tool CheckM. Please check CheckM documentation - https://github.com/Ecogenomics/CheckM on the usage for "checkm coverage" program |
sequin_meta |
tibble containing sequin names ("Feature column") and concentrations in attamoles/uL ("Concentration") column. |
seq_dilution |
tibble with first column "Sample" with same sample names as in f_tibble, and a second column "Dilution" showing ratio of sequins added to final sample volume (e.g. a value of 0.01 for a dilution of 1 volume sequin to 99 volumes sample) |
log_trans |
Boolean (TRUE or FALSE), should coverages and sequin concentrations be log-scaled? Default = TRUE |
coe_of_variation |
Acceptable coefficient of variation for coverage and detection (eg. 20 - for 20 % threshold of coefficient of variation). Coverages above the threshold value will be flagged in the plots. Default = 250 |
lod_limit |
(Decimal range 0-1) Threshold for the percentage of minimum detected sequins per concentration group. Default = 0 |
save_plots |
Boolean (TRUE or FALSE), should sequin scaling be saved? Default = TRUE |
plot_dir |
Directory where plots are to be saved. Will create a directory "sequin_scaling_plots_rlm" if it does not exist. |
a list of tibbles containing
mag_tab: a tibble with first column "Feature" that contains bin (or contig IDs), and the rest of the columns represent samples with features' scaled abundances (attamoles/uL)
mag_det: a tibble with first column "Feature" that contains bin (or contig IDs),
plots: linear regression plots for scaling MAG coverage values to absolute abundance (optional)
scale_fac: a master tibble with all of the intermediate values in above calculations
data(f_tibble, sequins, seq_dil) ### scaling sequins from coverage values scaled_features_rlm = scale_features_rlm(f_tibble,sequins, seq_dil)
data(f_tibble, sequins, seq_dil) ### scaling sequins from coverage values scaled_features_rlm = scale_features_rlm(f_tibble,sequins, seq_dil)
Sequins dilution data
data(seq_dil)
data(seq_dil)
An object of class "data.frame"
Sequins metadata
data(sequins)
data(sequins)
An object of class "data.frame"
A MAG table, similar to OTU table in phyloseq, will be generated from a concantenated GTDB taxa table for bacteria and archaea
tax.table(taxonomy)
tax.table(taxonomy)
taxonomy |
GTDB taxonomy data frame. A taxonomy file in the GTDB output format. Load the bacteria and archaea taxonomy outputs separately. The markdown requires loading the standard output files from GTDB-Tk separately for bacteria and archaea |
phyloseq-style taxonomy table, but for MAGs
data(taxonomy_tibble) ### Making phyloseq table from taxonomy metadata taxonomy.object = tax.table(taxonomy_tibble)
data(taxonomy_tibble) ### Making phyloseq table from taxonomy metadata taxonomy.object = tax.table(taxonomy_tibble)
Taxonomy table from GTDB-Tk output - combining both bacterial and archaeal taxonomy
data(taxonomy_tibble)
data(taxonomy_tibble)
An object of class "data.frame"
Taxonomy table in the format of phyloseq object to be used in the qSIP and (MW-)HR-SIP pipeline
data(taxonomy.object)
data(taxonomy.object)
An object of class "phyloseq"