PathoStat 1.4.1
Welcome to the PathoStat Advanced Usage vignette. PathoStat is a software package implemented in Shiny that lets you explore metagenomic data and perform statistical microbiome analysis with the confort of a graphical user interface (GUI).
While most of the PathoStat functions are interactive, i.e., through the GUI, you still can use some of the functions for very practical things, notwithstanding, reading data from PathoScope2 results, BIOM-formatted files, and phyloseq objects.
First, we choose the path where Pathoscope output files (.tsv) and sample information, i.e., metadata, are. In this case, we will be using the example data within the PathoStat package.
library(PathoStat)
example_data_dir <- system.file("example/data", package = "PathoStat")
Next, simply execute the createPathoStat()
function in order to combine all .tsv files within example_data_dir
. Remember we need a metadata file that is also tab-delimited (in this case sample_data.tsv
)
pstat <- createPathoStat(input_dir=example_data_dir,
sample_data_file="sample_data.tsv")
PathoStat objects (pstat) are an extension of the phyloseq-class and can be created from biom files using the import_biom
function from the phyloseq
package. For example:
library(phyloseq)
rich_dense_biom = system.file("extdata", "rich_dense_otu_table.biom",
package="phyloseq")
phyob <- import_biom(rich_dense_biom)
#and finally, we convert the phyloseq object into a pstat object
pstat_biom <- pathostat(phyob)
pstat objects can be saved or loaded from disk, e.g., to be shared with collaborators. For this, we have created the savePstat()
and loadPstat()
functions.
# Saving data
savePstat(pstat, outdir=".", outfileName="pstat_data.rda")
# Loading data
pstat <- loadPstat(indir=".", infileName="pstat_data.rda")
# Calling the runPathoStat() function to execute Pathostat interactively
runPathoStat(pstat)
If you are creating a Shiny app, you could easily implement the coreOTU tab using the example code below
#create a UI calling coreOTUModuleUI() function
shinyUI(mainPanel(
coreOTUModuleUI("coreOTUModule")
))
#and a server
shinyServer(function(input, output) {
callModule( coreOTUModule, "coreOTUModule", pstat )
})
Once your data are in, you can easily estimate taxon abundances by using the findRAfromCount()
function
#first, get the otu_table from pstat calling a phyloseq function
library(phyloseq)
otut<-otu_table(pstat)
ffc<-findRAfromCount(otut)
#lets see, for example, the abundances for sample 01 on the ffc object
head(ffc[,1], n = 15)
## OTU Table: [15 taxa and 1 samples]
## taxa are rows
## Sample_01_1
## ti|566546|org|Escherichia_coli_W 0.000000000
## ti|391904|org|Bifidobacterium_longum_subsp._infantis_ATCC_15697_=_JCM_1222 0.000000000
## ti|245018|org|butyrate-producing_bacterium_SSC/2 0.024756234
## ti|717962|org|Coprococcus_catus_GD/7 0.002544224
## ti|657321|org|Ruminococcus_bromii_L2-63 0.000000000
## others 0.002211211
## ti|347253|org|Streptococcus_salivarius_JIM8777 0.000000000
## ti|435590|org|Bacteroides_vulgatus_ATCC_8482 0.815630328
## ti|657313|org|Ruminococcus_torques_L2-14 0.000000000
## ti|1069533|org|Streptococcus_infantarius_subsp._infantarius_CJ18 0.000000000
## ti|595495|org|Escherichia_coli_KO11FL 0.000000000
## ti|657322|org|Faecalibacterium_prausnitzii_SL3/3 0.034786605
## ti|299768|org|Streptococcus_thermophilus_CNRZ1066 0.000000000
## ti|515620|org|Eubacterium_eligens_ATCC_27750 0.000000000
## ti|657318|org|Eubacterium_rectale_DSM_17629 0.048000586
We can fetch the lineages (and their Ids) by invoking the functions findTaxonomy()
and findTaxonMat()
dat <- ffc
ids <- rownames(dat)
tids <- unlist(lapply(ids, FUN = grepTid))
taxonLevels <- findTaxonomy(tids[1:4])
taxmat <- findTaxonMat(ids[1:4], taxonLevels)
taxmat
## superkingdom
## ti|566546|org|Escherichia_coli_W "Bacteria"
## ti|391904|org|Bifidobacterium_longum_subsp._infantis_ATCC_15697_=_JCM_1222 "Bacteria"
## ti|245018|org|butyrate-producing_bacterium_SSC/2 "Bacteria"
## ti|717962|org|Coprococcus_catus_GD/7 "Bacteria"
## kingdom
## ti|566546|org|Escherichia_coli_W "others"
## ti|391904|org|Bifidobacterium_longum_subsp._infantis_ATCC_15697_=_JCM_1222 "others"
## ti|245018|org|butyrate-producing_bacterium_SSC/2 "others"
## ti|717962|org|Coprococcus_catus_GD/7 "others"
## phylum
## ti|566546|org|Escherichia_coli_W "Proteobacteria"
## ti|391904|org|Bifidobacterium_longum_subsp._infantis_ATCC_15697_=_JCM_1222 "Actinobacteria"
## ti|245018|org|butyrate-producing_bacterium_SSC/2 "Firmicutes"
## ti|717962|org|Coprococcus_catus_GD/7 "Firmicutes"
## class
## ti|566546|org|Escherichia_coli_W "Gammaproteobacteria"
## ti|391904|org|Bifidobacterium_longum_subsp._infantis_ATCC_15697_=_JCM_1222 "Actinobacteria"
## ti|245018|org|butyrate-producing_bacterium_SSC/2 "Clostridia"
## ti|717962|org|Coprococcus_catus_GD/7 "Clostridia"
## order
## ti|566546|org|Escherichia_coli_W "Enterobacterales"
## ti|391904|org|Bifidobacterium_longum_subsp._infantis_ATCC_15697_=_JCM_1222 "Bifidobacteriales"
## ti|245018|org|butyrate-producing_bacterium_SSC/2 "Clostridiales"
## ti|717962|org|Coprococcus_catus_GD/7 "Clostridiales"
## family
## ti|566546|org|Escherichia_coli_W "Enterobacteriaceae"
## ti|391904|org|Bifidobacterium_longum_subsp._infantis_ATCC_15697_=_JCM_1222 "Bifidobacteriaceae"
## ti|245018|org|butyrate-producing_bacterium_SSC/2 "Lachnospiraceae"
## ti|717962|org|Coprococcus_catus_GD/7 "Lachnospiraceae"
## genus
## ti|566546|org|Escherichia_coli_W "Escherichia"
## ti|391904|org|Bifidobacterium_longum_subsp._infantis_ATCC_15697_=_JCM_1222 "Bifidobacterium"
## ti|245018|org|butyrate-producing_bacterium_SSC/2 "Anaerostipes"
## ti|717962|org|Coprococcus_catus_GD/7 "Coprococcus"
## species
## ti|566546|org|Escherichia_coli_W "Escherichia coli"
## ti|391904|org|Bifidobacterium_longum_subsp._infantis_ATCC_15697_=_JCM_1222 "Bifidobacterium longum"
## ti|245018|org|butyrate-producing_bacterium_SSC/2 "others"
## ti|717962|org|Coprococcus_catus_GD/7 "Coprococcus catus"
## no rank
## ti|566546|org|Escherichia_coli_W "cellular organisms"
## ti|391904|org|Bifidobacterium_longum_subsp._infantis_ATCC_15697_=_JCM_1222 "Terrabacteria group"
## ti|245018|org|butyrate-producing_bacterium_SSC/2 "Terrabacteria group"
## ti|717962|org|Coprococcus_catus_GD/7 "Terrabacteria group"
Many times researchers are interested in the accuracy of taxon abundance estimates. In this subtab, we provide a way to compare within-sample taxa in terms of their abundance estimate and 95% confidence interval. You can estimate the 95% confidence interval for a two-taxon within sample comparison by using the plotConfRegion
#select taxon 1 and 2, from your samples (randomly in this case)
n<-nrow(as.matrix(rownames(otut)))
m<-nrow(as.matrix(colnames(otut)))
p1 <- otut[rownames(otut)[ sample(1:n, 1)],
colnames(otut)[sample(1:m, 1)]]
if (p1 <= 0) p1 <- 1
#random taxon for p2 in this case again
n<-nrow(as.matrix(rownames(otut)))
m<-nrow(as.matrix(colnames(otut)))
p2 <- otut[rownames(otut)[ sample(1:n, 1)],
colnames(otut)[sample(1:m, 1)]]
if (p2 <= 0) p2 <- 1
size <- sum(otut[,colnames(otut)])
plotConfRegion(p1, p2, size, uselogit=FALSE)
log2cpm can compute log2(counts per million reads) and library size for each sample
# Only to sample one:
lcpm <- log2CPM(otut[,1])
lcpm
## $y
## Sample_01_1
## ti|566546|org|Escherichia_coli_W 1.735572
## ti|391904|org|Bifidobacterium_longum_subsp._infantis_ATCC_15697_=_JCM_1222 1.735572
## ti|245018|org|butyrate-producing_bacterium_SSC/2 14.595689
## ti|717962|org|Coprococcus_catus_GD/7 11.314888
## ti|657321|org|Ruminococcus_bromii_L2-63 1.735572
## others 11.112782
## ti|347253|org|Streptococcus_salivarius_JIM8777 1.735572
## ti|435590|org|Bacteroides_vulgatus_ATCC_8482 19.637552
## ti|657313|org|Ruminococcus_torques_L2-14 1.735572
## ti|1069533|org|Streptococcus_infantarius_subsp._infantarius_CJ18 1.735572
## ti|595495|org|Escherichia_coli_KO11FL 1.735572
## ti|657322|org|Faecalibacterium_prausnitzii_SL3/3 15.086373
## ti|299768|org|Streptococcus_thermophilus_CNRZ1066 1.735572
## ti|515620|org|Eubacterium_eligens_ATCC_27750 1.735572
## ti|657318|org|Eubacterium_rectale_DSM_17629 15.550855
## ti|717608|org|Clostridium_cf._saccharolyticum_K10 1.735572
## ti|349741|org|Akkermansia_muciniphila_ATCC_BAA-835 1.735572
## ti|862965|org|Haemophilus_parainfluenzae_T3T1 1.735572
## ti|536056|org|Escherichia_coli_DH1 1.735572
## ti|679935|org|Alistipes_finegoldii_DSM_17242 14.243119
## ti|479436|org|Veillonella_parvula_DSM_2008 1.735572
## ti|717959|org|Alistipes_shahii_WAL_8301 1.735572
## ti|515619|org|Eubacterium_rectale_ATCC_33656 1.735572
## ti|245012|org|butyrate-producing_bacterium_SM4/1 1.735572
## ti|420247|org|Methanobrevibacter_smithii_ATCC_35061 1.735572
## ti|367928|org|Bifidobacterium_adolescentis_ATCC_15703 1.735572
## ti|272559|org|Bacteroides_fragilis_NCTC_9343 15.132310
## ti|322159|org|Streptococcus_thermophilus_LMD-9 11.201138
## ti|657314|org|Ruminococcus_obeum_A2-162 10.397350
## ti|657323|org|Ruminococcus_sp._SR1/5 13.674784
## ti|435591|org|Parabacteroides_distasonis_ATCC_8503 1.735572
##
## $lib.size
## Sample_01_1
## 150144
Lets create a heatmap from our pstat
object. We siply use the phyloseq
function plot_heatmap
.
#select a tax level for the heatmap plot
taxonLevel<-"class"
physeq <- tax_glom(pstat,taxonLevel)
plot_heatmap(physeq, taxa.label=taxonLevel)