Package 'metagenomeSeq' - Bioconductor

<strong>Package</strong> ‘metagenomeSeq’May 22, 2014Title Statistical analysis for sparse high-throughput sequencingVersion 1.6.0Date 2013-04-10Author Joseph Nathaniel Paulson, Mihai Pop, Hector Corrada BravoMaintainer Joseph N. Paulson Description metagenomeSeq is designed to determine features (be itOperational Taxanomic Unit (OTU), species, etc.) that aredifferentially abundant between two or more groups of multiplesamples. metagenomeSeq is designed to address the effects ofboth normalization and under-sampling of microbial communitieson disease association detection and the testing of feature correlations.License Artistic-2.0Depends R(>= 3.0), Biobase, limma, matrixStats, methods, RColorBrewer,gplotsSuggests annotate, biom, parallel, vegan, knitrCollate 'aggregateByTaxonomy.R' 'allClasses.R' 'biom2MRexperiment.R''calculateEffectiveSamples.R' 'cumNorm.R' 'cumNormStat.R''cumNormStatFast.R' 'cumNormMat.R' 'correlationTest.R''doCountMStep.R' 'doZeroMStep.R' 'doEStep.R' 'exportMat.R''exportStats.R' 'fitDO.R' 'fitMeta.R' 'fitPA.R' 'fitZig.R''filterData.R' 'getEpsilon.R' 'getCountDensity.R' 'getNegativeLogLikelihoods.R' 'getPi.R' 'getZ.R''isItStillActive.R' 'load_biom.R' 'load_meta.R' 'load_metaQ.R''load_phenoData.R' 'misc.R' 'MRtable.R' 'MRcoefs.R''MRfulltable.R' 'MRexperiment2biom.R' 'plotMRheatmap.R''plotCorr.R' 'plotOTU.R' 'plotOrd.R' 'plotRare.R' 'plotGenus.R' 'plotFeature.R' 'zigControl.R'VignetteBuilder knitrURL http://cbcb.umd.edu/software/metagenomeSeqbiocViews DifferentialExpression, Metagenomics, Visualization1

4 biom2MRexperimentUsageaggregateByTaxonomy(obj, lvl, alternate = FALSE, norm = FALSE,log = FALSE, aggfun = colSums, sl = 1000, out = "MRexperiment")aggTax(obj, lvl, alternate = FALSE, norm = FALSE, log = FALSE,aggfun = colSums, sl = 1000, out = "MRexperiment")ArgumentsobjlvlalternatenormlogaggfunA MRexperiment object or count matrix.featureData column name from the MRexperiment object or if count matrix objecta vector of labels.Use the rowname for undefined OTUs instead of aggregating to "no_match".Whether to aggregate normalized counts or not.Whether or not to log2 transform the counts - if MRexperiment object.Aggregation function.sl scaling value, default is 1000.outEither ’MRexperiment’ or ’matrix’ValueAn aggregated count matrix.Examples# not run# aggregateByTaxonomy(mouseData,lvl="genus",norm=TRUE,aggfun=colMedians)# aggTax(mouseData,lvl=phylum,norm=FALSE,aggfun=colSums)biom2MRexperimentBiome to MRexperiment objectsDescriptionWrapper to convert biome files to MRexperiment objects.Usagebiom2MRexperiment(obj)ArgumentsobjThe biome object file.

calculateEffectiveSamples 5ValueA MRexperiment object.See Alsoload_meta load_phenoData newMRexperiment load_biomExamples#library(biom)#rich_dense_file = system.file("extdata", "rich_dense_otu_table.biom", package = "biom")#x = read_biom(rich_dense_file)#biom2MRexperiment(x)calculateEffectiveSamplesEstimated effective samples per featureDescriptionCalculates the number of estimated effective samples per feature from the output of a fitZig run.The estimated effective samples per feature is calculated as the sum_1^n (n = number of samples)1-z_i where z_i is the posterior probability a feature belongs to the technical distribution.UsagecalculateEffectiveSamples(obj)ArgumentsobjThe output of fitZig run on a MRexperiment object.ValueA list of the estimated effective samples per feature.See AlsofitZig MRcoefs MRfulltable

6 correlationTestcorrectIndicesCalculate the correct indices for the output of correlationTestDescriptionUsageConsider the upper triangular portion of a matrix of size nxn. Results from the correlationTestare output as the combination of two vectors, correlation statistic and p-values. The order of theoutput is 1vs2, 1vs3, 1vs4, etc. The correctIndices returns the correct indices to fill a correlationmatrix or correlation-pvalue matrix.correctIndices(n)ArgumentsnThe number of features compared by correlationTest (nrow(mat)).ValueA vector of the indices for an upper triangular matrix.Examplesdata(mouseData)mat = MRcounts(mouseData)[55:60,]cors = correlationTest(mat)ind = correctIndices(nrow(mat))cormat = as.matrix(dist(mat))cormat[cormat>0] = 0cormat[upper.tri(cormat)][ind] = cors[,1]table(cormat[1,-1] - cors[1:5,1])correlationTestPairwise correlation for each row of a matrix or MRexperiment objectDescriptionUsageCalculates the pairwise correlation statistics and associated p-values.correlationTest(obj, method = "pearson", alternative = "two.sided",norm = TRUE, log = TRUE, parallel = FALSE, cores = 2,override = FALSE, ...)

cumNorm 7ArgumentsValueobjmethodalternativenormlogparallelcoresoverrideA MRexperiment object or count matrix.One of ’pearson’,’spearman’, or ’kendall’.Indicates the alternative hypothesis and must be one of ’two.sided’, ’greater’(positive) or ’less’(negative). You can specify just the initial letter.Whether to aggregate normalized counts or not - if MRexperiment object.Whether or not to log2 transform the counts - if MRexperiment object.Parallelize the correlation testing?Number of cores to make use of if parallel == TRUE.If the number of rows to test is over a thousand the test will not commence(unless override==TRUE).... Extra settings for mclapply.A matrix of size choose(number of rows, 2) by 2. The first column corresponds to the correlationvalue. The second column the p-value.Examplesdata(mouseData)cors = correlationTest(mouseData[1:10,],norm=FALSE,log=FALSE)head(cors)cumNormCumulative sum scaling factors.DescriptionCalculates each column’s quantile and calculates the sum up to and including that quantile.UsagecumNorm(obj, p = cumNormStatFast(obj))ArgumentsobjpAn MRexperiment object.The pth quantile.ValueVector of the sum up to and including a sample’s pth quantile

8 cumNormMatSee AlsofitZig cumNormStatExamplesdata(mouseData)cumNorm(mouseData)head(normFactors(mouseData))cumNormMatCumulative sum scaling factors.DescriptionCalculates each column’s quantile and calculates the sum up to and including that quantile.UsagecumNormMat(obj, p = cumNormStatFast(obj), sl = 1000)ArgumentsobjpslA MRexperiment object.The pth quantile.The value to scale by (default=1000).ValueReturns a matrix normalized by scaling counts up to and including the pth quantile.See AlsofitZig cumNormExamplesdata(mouseData)head(cumNormMat(mouseData))

cumNormStat 9cumNormStatCumulative sum scaling percentile selectionDescriptionCalculates the percentile for which to sum counts up to and scale by. cumNormStat might bedeprecated one day.UsagecumNormStat(obj, qFlag = TRUE, pFlag = FALSE, rel = 0.1, ...)ArgumentsobjqFlagpFlagrelA MRexperiment object.Flag to either calculate the proper percentile using R’s step-wise quantile functionor approximate function.Plot the relative difference of the median deviance from the reference.Cutoff for the relative difference from one median difference from the referenceto the next... Applicable if pFlag == TRUE. Additional plotting parameters.ValuePercentile for which to scale dataSee AlsofitZig cumNorm cumNormStatFastExamplesdata(mouseData)p = round(cumNormStat(mouseData,pFlag=FALSE),digits=2)

10 doCountMStepcumNormStatFastCumulative sum scaling percentile selectionDescriptionCalculates the percentile for which to sum counts up to and scale by. Faster version than availablein cumNormStat.UsagecumNormStatFast(obj, pFlag = FALSE, rel = 0.1, ...)ArgumentsValueobjpFlagrelA MRexperiment object.Plot the median difference quantiles.Cutoff for the relative difference from one median difference from the referenceto the next.... Applicable if pFlag == TRUE. Additional plotting parameters.Percentile for which to scale dataSee AlsofitZig cumNorm cumNormStatExamplesdata(mouseData)p = round(cumNormStatFast(mouseData,pFlag=FALSE),digits=2)doCountMStepCompute the Maximization step calculation for features still active.DescriptionMaximization step is solved by weighted least squares. The function also computes counts residuals.UsagedoCountMStep(z, y, mmCount, stillActive, fit2 = NULL)

doEStep 11ArgumentszymmCountstillActivefit2Matrix (m x n) of estimate responsibilities (probabilities that a count comes froma spike distribution at 0).Matrix (m x n) of count observations.Model matrix for the count distribution.Boolean vector of size M, indicating whether a feature converged or not.Previous fit of the count model.DetailsMaximum-likelihood estimates are approximated using the EM algorithm where we treat mixturemembership $delta_ij$ = 1 if $y_ij$ is generated from the zero point mass as latent indicator variables.The density is defined as $f_zig(y_ij = pi_j(S_j)*f_0(y_ij) +(1-pi_j (S_j)) * f_count(y_ij;mu_i,sigma_i^2)$.The log-likelihood in this extended model is $(1-delta_ij) log f_count(y;mu_i,sigma_i^2 )+delta_ijlog pi_j(s_j)+(1-delta_ij)log (1-pi_j (s_j))$. The responsibilities are defined as $z_ij = pr(delta_ij=1| data)$.ValueUpdate matrix (m x n) of estimate responsibilities (probabilities that a count comes from a spikedistribution at 0).See AlsofitZigdoEStepCompute the Expectation step.DescriptionEstimates the responsibilities $z_ij = fracpi_j cdot I_0(y_ijpi_j cdot I_0(y_ij + (1-pi_j) cdot f_count(y_ijUsagedoEStep(countResiduals, zeroResiduals, zeroIndices)ArgumentscountResiduals Residuals from the count model.zeroResiduals Residuals from the zero model.zeroIndices Index (matrix m x n) of counts that are zero/non-zero.

12 doZeroMStepDetailsValueMaximum-likelihood estimates are approximated using the EM algorithm where we treat mixturemembership $delta_ij$ = 1 if $y_ij$ is generated from the zero point mass as latent indicatorvariables. The density is defined as $f_zig(y_ij = pi_j(S_j) cdot f_0(y_ij) +(1-pi_j (S_j))cdotf_count(y_ij;mu_i,sigma_i^2)$. The log-likelihood in this extended model is $(1-delta_ij) logf_count(y;mu_i,sigma_i^2 )+delta_ij log pi_j(s_j)+(1-delta_ij)log (1-pi_j (sj))$. The responsibilitiesare defined as $z_ij = pr(delta_ij=1 | data)$.Updated matrix (m x n) of estimate responsibilities (probabilities that a count comes from a spikedistribution at 0).See AlsofitZigdoZeroMStepCompute the zero Maximization step.DescriptionUsagePerforms Maximization step calculation for the mixture components. Uses least squares to fit theparameters of the mean of the logistic distribution. $$ pi_j = sum_i^M frac1Mz_ij $$ Maximumlikelihoodestimates are approximated using the EM algorithm where we treat mixture membership$delta_ij$ = 1 if $y_ij$ is generated from the zero point mass as latent indicator variables. The densityis defined as $f_zig(y_ij = pi_j(S_j) cdot f_0(y_ij) +(1-pi_j (S_j))cdot f_count(y_ij;mu_i,sigma_i^2)$.The log-likelihood in this extended model is $(1-delta_ij) log f_count(y;mu_i,sigma_i^2 )+delta_ijlog pi_j(s_j)+(1-delta_ij)log (1-pi_j (sj))$. The responsibilities are defined as $z_ij = pr(delta_ij=1| data)$.doZeroMStep(z, zeroIndices, mmZero)ArgumentszzeroIndicesmmZeroMatrix (m x n) of estimate responsibilities (probabilities that a count comes froma spike distribution at 0).Index (matrix m x n) of counts that are zero/non-zero.The zero model, the model matrix to account for the change in the number ofOTUs observed as a linear effect of the depth of coverage.ValueList of the zero fit (zero mean model) coefficients, variance - scale parameter (scalar), and normalizedresiduals of length sum(zeroIndices).

exportMat 13See AlsofitZigexportMatExport the normalized MRexperiment dataset as a matrix.DescriptionThis function allows the user to take a dataset of counts and output the dataset to the user’s workspaceas a tab-delimited file, etc.UsageexportMat(obj, log = TRUE, norm = TRUE, sep = "\t",file = "~/Desktop/matrix.tsv")ArgumentsobjlognormsepfileA MRexperiment object or count matrix.Whether or not to log transform the counts - if MRexperiment object.Whether or not to normalize the counts - if MRexperiment object.Separator for writing out the count matrix.Output file name.ValueNASee AlsocumNormExamples# see vignette

14 expSummaryexportStatsVarious statistics of the count data.DescriptionUsageA matrix of values for each sample. The matrix consists of sample ids, the sample scaling factor,quantile value, the number identified features, and library size (depth of coverage).exportStats(obj, p = cumNormStat(obj), file = "~/Desktop/res.stats.tsv")ArgumentsobjpfileA MRexperiment object with count data.Quantile value to calculate the scaling factor and quantiles for the various samples.Output file name.ValueNone.See AlsocumNorm quantileExamples# see vignetteexpSummaryAccess MRexperiment object experiment dataDescriptionUsageThe expSummary vectors represent the column (sample specific) sums of features, i.e. the totalnumber of reads for a sample, libSize and also the normalization factors, normFactor.expSummary(obj)Argumentsobja MRexperiment object.

filterData 15Author(s)Joseph N. Paulson, jpaulson@umiacs.umd.eduExamplesdata(mouseData)expSummary(mouseData)filterDataFilter datasets according to no. features present in features with atleast a certain depth.DescriptionFilter the data based on the number of present features after filtering samples by depth of coverage.There are many ways to filter the object, this is just one way.UsagefilterData(obj, present = 1, depth = 1000)ArgumentsobjpresentdepthA MRexperiment object or count matrix.Features with at least ’present’ postive samples.Sampls with at least this much depth of coverageValueA MRexperiment object.Examplesdata(mouseData)filterData(mouseData)

16 fitDOfitDOWrapper to calculate Discovery Odds Ratios on feature values.DescriptionThis function returns a data frame of p-values, odds ratios, lower and upper confidence limits forevery row of a matrix. The discovery odds ratio is calculated as using Fisher’s exact test on actualcounts. The test’s hypothesis is whether or not the discovery of counts for a feature (of all counts)is found in greater proportion in a particular group.UsagefitDO(obj, cl, norm = TRUE, log = TRUE)ArgumentsobjclnormlogA MRexperiment object with a count matrix, or a simple count matrix.Group comparisonWhether or not to normalize the counts - if MRexperiment object.Whether or not to log2 transform the counts - if MRexperiment object.ValueNASee AlsocumNorm fitZigExamplesdata(lungData)k = grep("Extraction.Control",pData(lungData)$SampleType)lungTrim = lungData[,-k]lungTrim = lungTrim[-which(rowSums(MRcounts(lungTrim)>0)

fitMeta 17fitMetaComputes a slightly modified form of Metastats.DescriptionUsageWrapper to perform the permutation test on the t-statistic. This is the original method employedby metastats (for non-sparse large samples). We include CSS normalization though (optional) andlog2 transform the data. In this method the null distribution is not assumed to be a t-dist.fitMeta(obj, mod, useCSSoffset = TRUE, B = 1000, coef = 2, sl = 1000)ArgumentsobjmoduseCSSoffsetBcoefslA MRexperiment object with count data.The model for the count distribution.Boolean, whether to include the default scaling parameters in the model or not.Number of permutations.The coefficient of interest.The value to scale by (default=1000).ValueCall made, fit object from lmFit, t-statistics and p-values for each feature.fitPAWrapper to run fisher’s test on presence/absence of a feature.DescriptionUsageThis function returns a data frame of p-values, odds ratios, lower and upper confidence limits forevery row of a matrix.fitPA(obj, cl, thres = 0)ArgumentsobjclthresA MRexperiment object with a count matrix, or a simple count matrix.Group comparisonThreshold for defining presence/absence.

18 fitZigValueNASee AlsocumNorm fitZigExamplesdata(lungData)k = grep("Extraction.Control",pData(lungData)$SampleType)lungTrim = lungData[,-k]lungTrim = lungTrim[-which(rowSums(MRcounts(lungTrim)>0)

getCountDensity 19See AlsocumNorm zigControlExamplesdata(lungData)k = grep("Extraction.Control",pData(lungData)$SampleType)lungTrim = lungData[,-k]k = which(rowSums(MRcounts(lungTrim)>0)

20 getNegativeLogLikelihoodsgetEpsilonCalculate the relative difference between iterations of the negative loglikelihoods.DescriptionUsageMaximum-likelihood estimates are approximated using the EM algorithm where we treat mixturemembership $delta_ij$ = 1 if $y_ij$ is generated from the zero point mass as latent indicator variables.The log-likelihood in this extended model is $(1-delta_ij) log f_count(y;mu_i,sigma_i^2)+delta_ij log pi_j(s_j)+(1-delta_ij)log (1-pi_j (sj))$. The responsibilities are defined as $z_ij =pr(delta_ij=1 | data)$.getEpsilon(nll, nllOld)ArgumentsnllnllOldVector of size M with the current negative log-likelihoods.Vector of size M with the previous iterations negative log-likelihoods.ValueVector of size M of the relative differences between the previous and current iteration nll.See AlsofitZiggetNegativeLogLikelihoodsCalculate the negative log-likelihoods for the various features giventhe residuals.DescriptionUsageMaximum-likelihood estimates are approximated using the EM algorithm where we treat mixturemembership $delta_ij$ = 1 if $y_ij$ is generated from the zero point mass as latent indicator variables.The log-likelihood in this extended model is $(1-delta_ij) log f_count(y;mu_i,sigma_i^2)+delta_ij log pi_j(s_j)+(1-delta_ij)log (1-pi_j (sj))$. The responsibilities are defined as $z_ij =pr(delta_ij=1 | data and current values)$.getNegativeLogLikelihoods(z, countResiduals, zeroResiduals)

getPi 21ArgumentszMatrix (m x n) of estimate responsibilities (probabilities that a count comes froma spike distribution at 0).countResiduals Residuals from the count model.zeroResidualsResiduals from the zero model.ValueVector of size M of the negative log-likelihoods for the various features.See AlsofitZiggetPiCalculate the mixture proportions from the zero model / spike massmodel residuals.DescriptionF(x) = 1 / (1 + exp(-(x-m)/s)) (the CDF of the logistic distribution). Provides the probability thata real-valued random variable X with a given probability distribution will be found at a value lessthan or equal to x. The output are the mixture proportions for the samples given the residuals fromthe zero model.UsagegetPi(residuals)ArgumentsresidualsResiduals from the zero model.ValueMixture proportions for each sample.See AlsofitZig

22 isItStillActivegetZCalculate the current Z estimate responsibilities (posterior probabilities)DescriptionCalculate the current Z estimate responsibilities (posterior probabilities)UsagegetZ(z, zUsed, stillActive, nll, nllUSED)ArgumentszzUsedstillActivenllnllUSEDMatrix (m x n) of estimate responsibilities (probabilities that a count comes froma spike distribution at 0).Matrix (m x n) of estimate responsibilities (probabilities that a count comes froma spike distribution at 0) that are actually used (following convergence).A vector of size M booleans saying if a feature is still active or not.Vector of size M with the current negative log-likelihoods.Vector of size M with the converged negative log-likelihoods.ValueA list of updated zUsed and nllUSED.See AlsofitZigisItStillActiveFunction to determine if a feature is still active.DescriptionIn the Expectation Maximization routine features posterior probabilities routinely converge basedon a tolerance threshold. This function checks whether or not the feature’s negative log-likelihood(measure of the fit) has changed or not.UsageisItStillActive(eps, tol, stillActive, stillActiveNLL, nll)

libSize 23ArgumentsepstolstillActiveVector of size M (features) representing the relative difference between the newnll and old nll.The threshold tolerance for the differenceA vector of size M booleans saying if a feature is still active or not.stillActiveNLL A vector of size M recording the negative log-likelihoods of the various features,updated for those still active.nllVector of size M with the current negative log-likelihoods.ValueNone.See AlsofitZiglibSizeAccess sample depth of coverage from MRexperiment objectDescriptionThe libSize vector represents the column (sample specific) sums of features, i.e. the total numberof reads for a sample or depth of coverage. It is used by fitZig.UsagelibSize(obj)Argumentsobja MRexperiment object.Author(s)Joseph N. Paulson, jpaulson@umiacs.umd.eduExamplesdata(lungData)head(libSize(lungData))

24 load_metaload_biomLoad objects organized in the Biome format.DescriptionWrapper to load Biome formatted object.Usageload_biom(file)ArgumentsfileThe biome object filepath.ValueA MRexperiment object.See Alsoload_meta load_phenoData newMRexperiment biom2MRexperimentExamples#library(biom)#rich_dense_file = system.file("extdata", "rich_dense_otu_table.biom", package = "biom")#x = load_biome(rich_dense_file)#xload_metaLoad a count dataset associated with a study.DescriptionLoad a matrix of OTUs in a tab delimited formatUsageload_meta(file, sep = "\t")ArgumentsfilesepPath and filename of the actual data file.File delimiter.

load_metaQ 25ValueA list with objects ’counts’ and ’taxa’.See Alsoload_phenoDataExamplesdataDirectory

26 lungDataload_phenoDataLoad a clinical/phenotypic dataset associated with a study.DescriptionLoad a matrix of metadata associated with a study.Usageload_phenoData(file, tran = FALSE, sep = "\t")ArgumentsfiletransepPath and filename of the actual clinical file.Boolean. If the covariates are along the columns and samples along the rows,then tran should equal TRUE.The separator for the file.ValueThe metadata as a dataframe.See Alsoload_metaExamples# see vignettelungDataOTU abundance matrix of samples from a smoker/non-smoker studyDescriptionUsageFormatThis is a list with a matrix of OTU counts,otu names, taxa annotations for each OTU, and phenotypicdata. Samples along the columns and OTUs along the rows.lungDataA list of OTU matrix, taxa, otus, and phenotypes

makeLabels 27Referenceshttp://www.ncbi.nlm.nih.gov/pubmed/21680950makeLabelsFunction to make labels simplerDescriptionBeginning to transition to better axes for plotsUsagemakeLabels(x = "samples", y = "abundance", norm, log)Argumentsxynormlogstring for the x-axisstring for the y-axisis the data normalized?is the data logged?Valuevector of x,y labelsmouseDataOTU abundance matrix of mice samples from a diet longitudinal studyDescriptionUsageFormatThis is a list with a matrix of OTU counts, taxa annotations for each OTU, otu names, and vector ofphenotypic data. Samples along the columns and OTUs along the rows.mouseDataA list of OTU matrix, taxa, otus, and phenotypesReferenceshttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC2894525/

28 MRcoefsMRcoefsTable of top-ranked microbial marker gene from linear model fitDescriptionExtract a table of the top-ranked features from a linear model fit. This function will be updated soonto provide better flexibility similar to limma’s topTable.UsageMRcoefs(obj, by = 2, coef = NULL, number = 10, taxa = obj$taxa,uniqueNames = FALSE, adjust.method = "fdr", group = 0, eff = 0,numberEff = FALSE, file = NULL)ArgumentsobjbyValuecoefnumbertaxauniqueNamesA list containing the linear model fit produced by lmFit through fitZig.Column number or column name specifying which coefficient or contrast of thelinear model is of interest.Column number(s) or column name(s) specifying which coefficient or contrastof the linear model to display.The number of bacterial features to pick out.Taxa list.Number the various taxa.adjust.method Method to adjust p-values by. Default is "FDR". Options include "holm","hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". See p.adjustfor more details.groupeffnumberEfffileOne of three choices, 0,1,2,3,4. 0: the sort is ordered by a decreasing absolutevalue coefficient fit. 1: the sort is ordered by the raw coefficient fit in decreasingorder. 2: the sort is ordered by the raw coefficient fit in increasing order. 3:the sort is ordered by the p-value of the coefficient fit in increasing order. 4: nosorting.Restrict samples to have at least a "eff" quantile or number of effective samples.Boolean, whether eff should represent quantile (default/FALSE) or number.Name of output file, including location, to save the table.Table of the top-ranked features determined by the linear fit’s coefficient.See AlsofitZig MRtable

MRcounts 29Examplesdata(lungData)k = grep("Extraction.Control",pData(lungData)$SampleType)lungTrim = lungData[,-k]k = which(rowSums(MRcounts(lungTrim)>0)

30 MRexperimentMRexperimentClass "MRexperiment" – a modified eSet object for the data from highthroughputsequencing experimentsDescriptionThis is the main class for metagenomeSeq.Objects from the ClassExtendsObjects should be created with calls to newMRexperiment.Class eSet (package ’Biobase’), directly. Class VersionedBiobase (package ’Biobase’), by class"eSet", distance 2. Class Versioned (package ’Biobase’), by class "eSet", distance 3.MethodsNoteClass-specific methods.[,: Subset operation, taking two arguments and indexing the sample andvariable. Returns an MRexperiment object, including relevant metadata. Setting drop=TRUEgenerates an error. Subsetting the data, the experiment summary slot is repopulated and pDatais repopulated after calling factor (removing levels not present).Note: This is a summary for reference. For an explanation of the actual usage, see the vignette.MRexperiments are the main class in use by metagenomeSeq. The class extends eSet and providesadditional slots which are populated during the analysis pipeline.MRexperiment dataset are created with calls to newMRexperiment. MRexperiment datasets containraw count matrices (integers) accessible through MRcounts. Similarly, normalized count matricescan be accessed (following normalization) through MRcounts by calling norm=TRUE. Followingan analysis, a matrix of posterior probabilities for counts is accessible through posterior.probs.The normalization factors used in analysis can be recovered by normFactors, as can the librarysizes of samples (depths of coverage), libSize.Similarly to other RNASeq bioconductor packages available, the rows of the matrix correspondto a feature (be it OTU, species, gene, etc.) and each column an experimental sample. Pertinentclinical information and potential confounding factors are stored in the phenoData slot (accessedvia pData).To populate the various slots in an MRexperiment several functions are run. 1) cumNormStat calculatesthe proper percentile to calculate normalization factors. The cumNormStat slot is populated.2) cumNorm calculates the actual normalization factors using p = cumNormStat.Other functions will place subsequent matrices (normalized counts (cumNormMat), posterior probabilities(posterior.probs))

MRexperiment2biom 31As mentioned above, MRexperiment is derived from the virtual class,eSet and thereby has a phenoDataslot which allows for sample annotation. In the phenoData data frame factors are stored. The normalizationfactors and library size information is stored in a slot called expSummary that is anannotated data frame and is repopulated for subsetted data.Examples# See vignetteMRexperiment2biomMRexperiment to biom objectsDescriptionWrapper to convert MRexperiment objects to biom objects.UsageMRexperiment2biom(obj, id = NULL, norm = FALSE, log = FALSE, sl = 1000)ArgumentsobjidnormlogslThe MRexperiment object.Optional id for the biom matrix.Normalized data?Logged data?scaling factor for normalized counts.ValueA biom object.See Alsoload_meta load_phenoData newMRexperiment load_biom biom2MRexperiment

32 MRfulltableMRfulltableTable of top microbial marker gene from linear model fit includingsequence informationDescriptionExtract a table of the top-ranked features from a linear model fit. This function will be updated soonto provide better flexibility similar to limma’s topTable. This function differs from link{MRcoefs}in that it provides other information about the presence or absence of features to help ensure significantfeatures called are moderately present.UsageMRfulltable(obj, by = 2, coef = NULL, number = 10, taxa = obj$taxa,uniqueNames = FALSE, adjust.method = "fdr", group = 0, eff = 0,numberEff = FALSE, file = NULL)ArgumentsobjbycoefnumbertaxauniqueNamesA list containing the linear model fit produced by lmFit through fitZig.Column number or column name specifying which coefficient or contrast of thelinear model is of interest.Column number(s) or column name(s) specifying which coefficient or contrastof the linear model to display.The number of bacterial features to pick out.Taxa list.Number the various taxa.adjust.method Method to adjust p-values by. Default is "FDR". Options include "holm","hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". See p.adjustfor more details.groupeffnumberEfffileOne of four choices: 0,1,2,3,4. 0: the sort is ordered by a decreasing absolutevalue coefficient fit. 1: the sort is ordered by the raw coefficient fit in decreasingorder. 2: the sort is ordered by the raw coefficient fit in increasing order. 3:the sort is ordered by the p-value of the coefficient fit in increasing order. 4: nosorting.Restrict samples to have at least a "eff" quantile or number of effective samples.Boolean, whether eff should represent quantile (default/FALSE) or number.Name of output file, including location, to save the table.ValueTable of the top-ranked features determined by the linear fit’s coefficient.

MRtable 33See AlsofitZig MRcoefs MRtable fitPAExamplesdata(lungData)k = grep("Extraction.Control",pData(lungData)$SampleType)lungTrim = lungData[,-k]k = which(rowSums(MRcounts(lungTrim)>0)

34 newMRexperimentgroupeffnumberEfffileOne of three choices, 0,1,2,3,4. 0: the sort is ordered by a decreasing absolutevalue coefficient fit. 1: the sort is ordered by the raw coefficient fit in decreasingorder. 2: the sort is ordered by the raw coefficient fit in increasing order. 3:the sort is ordered by the p-value of the coefficient fit in increasing order. 4: nosorting.Restrict samples to have at least a "eff" quantile or number of effective samples.Boolean, whether eff should represent quantile (default/FALSE) or number.Name of file, including location, to save the table.ValueTable of the top-ranked features determined by the linear fit’s coefficient.See AlsofitZig MRcoefsExamplesdata(lungData)k = grep("Extraction.Control",pData(lungData)$SampleType)lungTrim = lungData[,-k]k = which(rowSums(MRcounts(lungTrim)>0)

normFactors 35ArgumentscountsphenoDatafeatureDatalibSizenormFactorsA matrix or data frame of count data. The count data is representative of thenumber of reads annotated for a feature (be it gene, OTU, species, etc). Rowsshould correspond to features and columns to samples.An AnnotatedDataFrame with pertinent sample information.An AnnotatedDataFrame with pertinent feature information.libSize, library size, is the total number of reads for a particular sample.normFactors, the normalization factors used in either the model or as scalingfactors of sample counts for each particular sample.DetailsValueSee MRexperiment-class and eSet (from the Biobase package) for the meaning of the variousslots.an object of class MRexperimentAuthor(s)Joseph N Paulson, jpaulson@umiacs.umd.eduExamplescnts = matrix(abs(rnorm(1000)),nc=10)obj

36 plotCorrExamplesdata(lungData)head(normFactors(lungData))plotCorrBasic correlation plot function for normalized or unnormalizedcounts.DescriptionThis function plots a heatmap of the "n" features with greatest variance across rows.UsageplotCorr(obj, n, log = TRUE, norm = TRUE, fun = cor, ...)ArgumentsobjnlognormfunA MRexperiment object with count data.The number of features to plot. This chooses the "n" features with greatestvariance.Whether or not to log2 transform the counts - if MRexperiment object.Whether or not to normalize the counts - if MRexperiment object.Function to calculate pair-wise relationships. Default is pearson correlation... Additional plot arguments.ValueNASee AlsocumNormMatExamplesdata(mouseData)plotCorr(obj=mouseData,n=200,cexRow = 0.4,cexCol = 0.4,trace="none",dendrogram="none",col = colorRampPalette(brewer.pal(9, "RdBu"))(50))

plotFeature 37plotFeatureBasic plot function of the raw or normalized data.DescriptionUsageThis function plots the abundance of a particular OTU by class. The function is the typical manhattanplot of the abundances.plotFeature(obj, otuIndex, classIndex, col = "black", sort = TRUE,sortby = NULL, norm = TRUE, log = TRUE, sl = 1000, ...)ArgumentsobjotuIndexclassIndexcolsortsortbynormlogslValueA MRexperiment object with count data.The row to plotA list of the samples in their respective groups.A vector to color samples by.Boolean, sort or not.Default is sort by library size, alternative vector for sortingWhether or not to normalize the counts - if MRexperiment object.Whether or not to log2 transform the counts - if MRexperiment object.Scaling factor - if MRexperiment and norm=TRUE.... Additional plot arguments.NASee AlsocumNormExamplesdata(mouseData)classIndex=list(Western=which(pData(mouseData)$diet=="Western"))classIndex$BK=which(pData(mouseData)$diet=="BK")otuIndex = 8770par(mfrow=c(2,1))dates = pData(mouseData)$dateplotFeature(mouseData,norm=FALSE,log=FALSE,otuIndex,classIndex,col=dates,sortby=dates,ylab="Raw reads")

38 plotGenusplotGenusBasic plot function of the raw or normalized data.DescriptionThis function plots the abundance of a particular OTU by class. The function uses the estimatedposterior probabilities to make technical zeros transparent.UsageplotGenus(obj, otuIndex, classIndex, log = TRUE, norm = TRUE,no = 1:length(otuIndex), labs = TRUE, xlab = NULL, ylab = NULL,jitter = TRUE, jitter.factor = 1, pch = 21, ret = FALSE, ...)ArgumentsobjAn MRexperiment object with count data.otuIndex A list of the otus with the same annotation.classIndex A list of the samples in their respective groups.logWhether or not to log2 transform the counts - if MRexperiment object.normWhether or not to normalize the counts - if MRexperiment object.noWhich of the otuIndex to plot.jitter.factor Factor value for jitterpchStandard pch value for the plot command.labsWhether to include group labels or not. (TRUE/FALSE)xlabxlabel for the plot.ylabylabel for the plot.jitter Boolean to jitter the count data or not.retBoolean to return the observed data that would have been plotted.... Additional plot arguments.ValueNASee AlsocumNorm

plotMRheatmap 39Examplesdata(mouseData)classIndex=list(controls=which(pData(mouseData)$diet=="BK"))classIndex$cases=which(pData(mouseData)$diet=="Western")otuIndex = grep("Strep",fData(mouseData)$fdata)otuIndex=otuIndex[order(rowSums(MRcounts(mouseData)[otuIndex,]),decreasing=TRUE)]plotGenus(mouseData,otuIndex,classIndex,no=1:2,xaxt="n",norm=FALSE,ylab="Strep normalized log(cpt)")plotMRheatmapBasic heatmap plot function for normalized counts.DescriptionThis function plots a heatmap of the "n" features with greatest variance across rows.UsageplotMRheatmap(obj, n, log = TRUE, norm = TRUE, ...)ArgumentsobjnValuelognormA MRexperiment object with count data.The number of features to plot. This chooses the "n" features with greatestvariance.Whether or not to log2 transform the counts - if MRexperiment object.Whether or not to normalize the counts - if MRexperiment object.... Additional plot arguments.NASee AlsocumNormMatExamplesdata(mouseData)trials = pData(mouseData)$dietheatmapColColors=brewer.pal(12,"Set3")[as.integer(factor(trials))];heatmapCols = colorRampPalette(brewer.pal(9, "RdBu"))(50)plotMRheatmap(obj=mouseData,n=200,cexRow = 0.4,cexCol = 0.4,trace="none",col = heatmapCols,ColSideColors = heatmapColColors)

40 plotOrdplotOrdPlot of either PCA or MDS coordinates for the distances of normalizedor unnormalized counts.DescriptionUsageThis function plots the PCA / MDS coordinates for the "n" features of interest. Potentially uncoveringbatch effects or feature relationships.plotOrd(obj, tran = TRUE, comp = 1:2, log = TRUE, norm = TRUE,usePCA = TRUE, useDist = FALSE, distfun = stats::dist,dist.method = "euclidian", ret = FALSE, n = NULL, ...)ArgumentsobjtrancompusePCAuseDistdistfundist.methodlognormretnValueA MRexperiment object or count matrix.Transpose the matrix.Which components to displayTRUE/FALSE whether to use PCA or MDS coordinates (TRUE is PCA).TRUE/FALSE whether to calculate distances.Distance function, default is stats::distIf useDist==TRUE, what method to calculate distances.Whether or not to log2 the counts - if MRexperiment object.Whether or not to normalize the counts - if MRexperiment object.Whether or not to output the coordinates.Number of features to make use of in calculating your distances.... Additional plot arguments.NASee AlsocumNormMatExamplesdata(mouseData)cl = pData(mouseData)[,3]plotOrd(mouseData,tran=TRUE,useDist=TRUE,pch=21,bg=factor(cl),usePCA=FALSE)

plotOTU 41plotOTUBasic plot function of the raw or normalized data.DescriptionThis function plots the abundance of a particular OTU by class. The function uses the estimatedposterior probabilities to make technical zeros transparent.UsageplotOTU(obj, otu, classIndex, log = TRUE, norm = TRUE, jitter.factor = 1,pch = 21, labs = TRUE, xlab = NULL, ylab = NULL, jitter = TRUE,ret = FALSE, ...)ArgumentsobjA MRexperiment object with count data.otuThe row number/OTU to plot.classIndex A list of the samples in their respective groups.logWhether or not to log2 transform the counts - if MRexperiment object.normWhether or not to normalize the counts - if MRexperiment object.jitter.factor Factor value for jitter.pchStandard pch value for the plot command.labsWhether to include group labels or not. (TRUE/FALSE)xlabxlabel for the plot.ylabylabel for the plot.jitter Boolean to jitter the count data or not.retBoolean to return the observed data that would have been plotted.... Additional plot arguments.ValueNASee AlsocumNormExamplesdata(mouseData)classIndex=list(controls=which(pData(mouseData)$diet=="BK"))classIndex$cases=which(pData(mouseData)$diet=="Western")# you can specify whether or not to normalize, and to what levelplotOTU(mouseData,otu=9083,classIndex,norm=FALSE,main="9083 feature abundances")

42 plotRareplotRarePlot of rarefaction effectDescriptionThis function plots the number of observed features vs. the depth of coverage.UsageplotRare(obj, cl = NULL, ret = FALSE, ...)ArgumentsobjclretA MRexperiment object with count data or matrix.Vector of classes for various samples.True/False, return the number of features and the depth of coverage as a vector.... Additional plot arguments.ValueNASee AlsoplotOrd, plotMRheatmap, plotCorr, plotOTU, plotGenusExamplesdata(mouseData)cl = factor(pData(mouseData)[,3])res = plotRare(mouseData,cl=cl,ret=TRUE,pch=21,bg=cl)tmp=lapply(levels(cl), function(lv) lm(res[,"ident"]~res[,"libSize"]-1, subset=cl==lv))for(i in 1:length(levels(cl))){abline(tmp[[i]], col=i)}legend("topleft", c("Diet 1","Diet 2"), text.col=c(1,2),box.col=NA)

posterior.probs 43posterior.probsAccess the posterior probabilities that results from analysisDescriptionAccessing the posterior probabilities following a run through fitZigUsageposterior.probs(obj)Argumentsobja MRexperiment object.Author(s)Joseph N. Paulson, jpaulson@umiacs.umd.eduExamples# see vignetteuniqueFeaturesTable of features unique to a groupDescriptionUsageCreates a table of features, their index, number of positive samples in a group, and the number ofreads in a group. Can threshold features by a minimum no. of reads or no. of samples.uniqueFeatures(obj, cl, nsamples = 0, nreads = 0)ArgumentsobjclnsamplesnreadsEither a MRexperiment object or matrix.A vector representing assigning samples to a group.The minimum number of positive samples.The minimum number of raw reads.ValueTable of features unique to a group

44 zigControlExamplesdata(mouseData)head(uniqueFeatures(mouseData[1:100,],cl=pData(mouseData)[,3]))zigControlSettings for the fitZig functionDescriptionSettings for the fitZig functionUsagezigControl(tol = 1e-04, maxit = 10, verbose = TRUE)ArgumentstolmaxitverboseThe tolerance for the difference in negative log likelihood estimates for a featureto remain active.The maximum number of iterations for the expectation-maximization algorithm.Whether to display iterative step summary statistics or not.ValueThe value for the tolerance, maximum no. of iterations, and the verbose warning.NotefitZig makes use of zigControl.See AlsofitZig cumNorm plotOTUExamplescontrol =zigControl(tol=1e-10,maxit=10,verbose=FALSE)

Index∗Topic packagemetagenomeSeq-package, 3[,MRexperiment-method (MRexperiment), 30aggregateByTaxonomy, 3aggTax (aggregateByTaxonomy), 3biom2MRexperiment, 4, 24, 31calculateEffectiveSamples, 5colMeans,MRexperiment-method(MRexperiment), 30colSums,MRexperiment-method(MRexperiment), 30correctIndices, 6correlationTest, 6corTest (correlationTest), 6cumNorm, 7, 8–10, 13, 14, 16, 18, 19, 30, 37,38, 41, 44cumNormMat, 8, 30, 36, 39, 40cumNormStat, 8, 9, 10, 30cumNormStatFast, 9, 10doCountMStep, 10doEStep, 11doZeroMStep, 12exportMat, 13exportMatrix (exportMat), 13exportStats, 14expSummary, 14expSummary,MRexperiment-method(expSummary), 14filterData, 15fitDO, 16fitMeta, 17fitPA, 17, 33fitZig, 5, 8–13, 16, 18, 18–23, 28, 33, 34, 43,44genusPlot (plotGenus), 38getCountDensity, 19getEpsilon, 20getNegativeLogLikelihoods, 20getPi, 21getZ, 22isItStillActive, 22libSize, 23, 30libSize,MRexperiment-method (libSize),23load_biom, 5, 24, 31load_meta, 5, 24, 24–26, 31load_metaQ, 25load_phenoData, 5, 24, 25, 26, 31lungData, 26makeLabels, 27metagenomeSeq (metagenomeSeq-package), 3metagenomeSeq-package, 3metagenomicLoader (load_meta), 24mouseData, 27MRcoefs, 5, 28, 33, 34MRcounts, 29, 30MRcounts,MRexperiment-method(MRcounts), 29MRexperiment, 30MRexperiment-class (MRexperiment), 30MRexperiment2biom, 31MRfulltable, 5, 32MRtable, 28, 33, 33newMRexperiment, 5, 24, 30, 31, 34normFactors, 30, 35normFactors,MRexperiment-method(normFactors), 35p.adjust, 28, 32, 33phenoData (load_phenoData), 26plotCorr, 36, 4245

46 INDEXplotFeature, 37plotGenus, 38, 42plotMRheatmap, 39, 42plotOrd, 40, 42plotOTU, 41, 42, 44plotRare, 42posterior.probs, 30, 43posterior.probs,MRexperiment-method(posterior.probs), 43qiimeLoader (load_metaQ), 25quantile, 14rowMeans,MRexperiment-method(MRexperiment), 30rowSums,MRexperiment-method(MRexperiment), 30settings2 (zigControl), 44uniqueFeatures, 43zigControl, 19, 44

Package 'metagenomeSeq' - Bioconductor

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?