Package version: PathoStat 1.0.0

Contents

1 Introduction

Welcome to the PathoStat Advanced Usage vignette. PathoStat is a software package implemented in Shiny that lets you explore metagenomic data and perform statistical microbiome analysis with the confort of a graphical user interface (GUI).
While most of the PathoStat functions are interactive, i.e., through the GUI, you still can use some of the functions for very practical things, notwithstanding, reading data from PathoScope2 results, BIOM-formatted files, and phyloseq objects.

2 Create Pathostat Object (pstat) from PathoScope2 output

First, we choose the path where Pathoscope output files (.tsv) and sample information, i.e., metadata, are. In this case, we will be using the example data within the PathoStat package.

library(PathoStat)
example_data_dir <- system.file("example/data", package = "PathoStat")

Next, simply execute the createPathoStat() function in order to combine all .tsv files within example_data_dir. Remember we need a metadata file that is also tab-delimited (in this case sample_data.tsv)

pstat <- createPathoStat(input_dir=example_data_dir, 
    sample_data_file="sample_data.tsv")

3 Create Pathostat Object (pstat) from BIOM file

PathoStat objects (pstat) are an extension of the phyloseq-class and can be created from biom files using the import_biom function from the phyloseq package. For example:

library(phyloseq)
rich_dense_biom  = system.file("extdata", "rich_dense_otu_table.biom", 
    package="phyloseq")
phyob <- import_biom(rich_dense_biom)

#and finally, we convert the phyloseq object into a pstat object
pstat_biom <- pathostat(phyob)

4 Saving and loading data, and running PathoStat

pstat objects can be saved or loaded from disk, e.g., to be shared with collaborators. For this, we have created the savePstat() and loadPstat() functions.

# Saving data
savePstat(pstat, outdir=".", outfileName="pstat_data.rda")
# Loading data
pstat <- loadPstat(indir=".", infileName="pstat_data.rda")
# Calling the runPathoStat() function to execute Pathostat interactively
runPathoStat(pstat)

5 Example functions for Shiny developers: coreOTUModule and coreOTUModuleUI

If you are creating a Shiny app, you could easily implement the coreOTU tab using the example code below

#create a UI calling coreOTUModuleUI() function
shinyUI(mainPanel( 
    coreOTUModuleUI("coreOTUModule")
))

#and a server
shinyServer(function(input, output) {
    callModule( coreOTUModule, "coreOTUModule", pstat )
})

6 Estimating taxon abundances: findRAfromCount

Once your data are in, you can easily estimate taxon abundances by using the findRAfromCount() function

#first, get the otu_table from pstat calling a phyloseq function
library(phyloseq)
otut<-otu_table(pstat)
ffc<-findRAfromCount(otut)
#lets see, for example, the abundances for sample 01 on the ffc object
head(ffc[,1], n = 15)
## OTU Table:          [15 taxa and 1 samples]
##                      taxa are rows
##                                                                  Sample_01_1
## ti|566546|org|Escherichia_coli_W                                 0.000000000
## ti|515619|org|Eubacterium_rectale_ATCC_33656                     0.000000000
## ti|657314|org|Ruminococcus_obeum_A2-162                          0.001345375
## ti|657323|org|Ruminococcus_sp._SR1/5                             0.013074116
## ti|515620|org|Eubacterium_eligens_ATCC_27750                     0.000000000
## ti|1069533|org|Streptococcus_infantarius_subsp._infantarius_CJ18 0.000000000
## ti|717962|org|Coprococcus_catus_GD/7                             0.002544224
## ti|245012|org|butyrate-producing_bacterium_SM4/1                 0.000000000
## ti|657318|org|Eubacterium_rectale_DSM_17629                      0.048000586
## ti|536056|org|Escherichia_coli_DH1                               0.000000000
## ti|299768|org|Streptococcus_thermophilus_CNRZ1066                0.000000000
## ti|272559|org|Bacteroides_fragilis_NCTC_9343                     0.035912191
## ti|717959|org|Alistipes_shahii_WAL_8301                          0.000000000
## ti|657313|org|Ruminococcus_torques_L2-14                         0.000000000
## ti|679935|org|Alistipes_finegoldii_DSM_17242                     0.019388054

7 Assigning lineages to taxonomy IDs: findTaxonomy and findTaxonMat

We can fetch the lineages (and their Ids) by invoking the functions findTaxonomy() and findTaxonMat()

dat <- ffc
ids <- rownames(dat)
tids <- unlist(lapply(ids, FUN = grepTid))
taxonLevels <- findTaxonomy(tids[1:4])
taxmat <- findTaxonMat(ids[1:4], taxonLevels)
taxmat
##                                              superkingdom kingdom 
## ti|566546|org|Escherichia_coli_W             "Bacteria"   "others"
## ti|515619|org|Eubacterium_rectale_ATCC_33656 "Bacteria"   "others"
## ti|657314|org|Ruminococcus_obeum_A2-162      "Bacteria"   "others"
## ti|657323|org|Ruminococcus_sp._SR1/5         "Bacteria"   "others"
##                                              phylum          
## ti|566546|org|Escherichia_coli_W             "Proteobacteria"
## ti|515619|org|Eubacterium_rectale_ATCC_33656 "Firmicutes"    
## ti|657314|org|Ruminococcus_obeum_A2-162      "Firmicutes"    
## ti|657323|org|Ruminococcus_sp._SR1/5         "Firmicutes"    
##                                              class                
## ti|566546|org|Escherichia_coli_W             "Gammaproteobacteria"
## ti|515619|org|Eubacterium_rectale_ATCC_33656 "Clostridia"         
## ti|657314|org|Ruminococcus_obeum_A2-162      "Clostridia"         
## ti|657323|org|Ruminococcus_sp._SR1/5         "Clostridia"         
##                                              order             
## ti|566546|org|Escherichia_coli_W             "Enterobacterales"
## ti|515619|org|Eubacterium_rectale_ATCC_33656 "Clostridiales"   
## ti|657314|org|Ruminococcus_obeum_A2-162      "Clostridiales"   
## ti|657323|org|Ruminococcus_sp._SR1/5         "Clostridiales"   
##                                              family              
## ti|566546|org|Escherichia_coli_W             "Enterobacteriaceae"
## ti|515619|org|Eubacterium_rectale_ATCC_33656 "Lachnospiraceae"   
## ti|657314|org|Ruminococcus_obeum_A2-162      "Lachnospiraceae"   
## ti|657323|org|Ruminococcus_sp._SR1/5         "Ruminococcaceae"   
##                                              genus         
## ti|566546|org|Escherichia_coli_W             "Escherichia" 
## ti|515619|org|Eubacterium_rectale_ATCC_33656 "others"      
## ti|657314|org|Ruminococcus_obeum_A2-162      "Blautia"     
## ti|657323|org|Ruminococcus_sp._SR1/5         "Ruminococcus"
##                                              species                
## ti|566546|org|Escherichia_coli_W             "Escherichia coli"     
## ti|515619|org|Eubacterium_rectale_ATCC_33656 "[Eubacterium] rectale"
## ti|657314|org|Ruminococcus_obeum_A2-162      "Blautia obeum"        
## ti|657323|org|Ruminococcus_sp._SR1/5         "others"               
##                                              no rank                       
## ti|566546|org|Escherichia_coli_W             "cellular organisms"          
## ti|515619|org|Eubacterium_rectale_ATCC_33656 "unclassified Lachnospiraceae"
## ti|657314|org|Ruminococcus_obeum_A2-162      "Terrabacteria group"         
## ti|657323|org|Ruminococcus_sp._SR1/5         "Terrabacteria group"

8 Obtaining a 95% confidence interval: plotConfRegion

Many times researchers are interested in the accuracy of taxon abundance estimates. In this subtab, we provide a way to compare within-sample taxa in terms of their abundance estimate and 95% confidence interval. You can estimate the 95% confidence interval for a two-taxon within sample comparison by using the plotConfRegion

#select taxon 1 and 2, from your samples (randomly in this case)
n<-nrow(as.matrix(rownames(otut)))
m<-nrow(as.matrix(colnames(otut)))

p1 <- otut[rownames(otut)[ sample(1:n, 1)], 
    colnames(otut)[sample(1:m, 1)]]
if (p1 <= 0) p1 <- 1

#random taxon for p2 in this case again
n<-nrow(as.matrix(rownames(otut)))
m<-nrow(as.matrix(colnames(otut)))
p2 <- otut[rownames(otut)[ sample(1:n, 1)], 
    colnames(otut)[sample(1:m, 1)]]
if (p2 <= 0) p2 <- 1
size <- sum(otut[,colnames(otut)])
plotConfRegion(p1, p2, size, uselogit=FALSE)

9 Obtaining counts per million: log2CPM

log2cpm can compute log2(counts per million reads) and library size for each sample

# Only to sample one:
lcpm <- log2CPM(otut[,1])
lcpm
## $y
##                                                                            Sample_01_1
## ti|566546|org|Escherichia_coli_W                                              1.735572
## ti|515619|org|Eubacterium_rectale_ATCC_33656                                  1.735572
## ti|657314|org|Ruminococcus_obeum_A2-162                                      10.397350
## ti|657323|org|Ruminococcus_sp._SR1/5                                         13.674784
## ti|515620|org|Eubacterium_eligens_ATCC_27750                                  1.735572
## ti|1069533|org|Streptococcus_infantarius_subsp._infantarius_CJ18              1.735572
## ti|717962|org|Coprococcus_catus_GD/7                                         11.314888
## ti|245012|org|butyrate-producing_bacterium_SM4/1                              1.735572
## ti|657318|org|Eubacterium_rectale_DSM_17629                                  15.550855
## ti|536056|org|Escherichia_coli_DH1                                            1.735572
## ti|299768|org|Streptococcus_thermophilus_CNRZ1066                             1.735572
## ti|272559|org|Bacteroides_fragilis_NCTC_9343                                 15.132310
## ti|717959|org|Alistipes_shahii_WAL_8301                                       1.735572
## ti|657313|org|Ruminococcus_torques_L2-14                                      1.735572
## ti|679935|org|Alistipes_finegoldii_DSM_17242                                 14.243119
## ti|367928|org|Bifidobacterium_adolescentis_ATCC_15703                         1.735572
## ti|435591|org|Parabacteroides_distasonis_ATCC_8503                            1.735572
## ti|245018|org|butyrate-producing_bacterium_SSC/2                             14.595689
## ti|479436|org|Veillonella_parvula_DSM_2008                                    1.735572
## ti|435590|org|Bacteroides_vulgatus_ATCC_8482                                 19.637552
## ti|391904|org|Bifidobacterium_longum_subsp._infantis_ATCC_15697_=_JCM_1222    1.735572
## ti|657321|org|Ruminococcus_bromii_L2-63                                       1.735572
## ti|862965|org|Haemophilus_parainfluenzae_T3T1                                 1.735572
## ti|657322|org|Faecalibacterium_prausnitzii_SL3/3                             15.086373
## ti|420247|org|Methanobrevibacter_smithii_ATCC_35061                           1.735572
## ti|347253|org|Streptococcus_salivarius_JIM8777                                1.735572
## ti|595495|org|Escherichia_coli_KO11FL                                         1.735572
## others                                                                       11.112782
## ti|322159|org|Streptococcus_thermophilus_LMD-9                               11.201138
## ti|349741|org|Akkermansia_muciniphila_ATCC_BAA-835                            1.735572
## ti|717608|org|Clostridium_cf._saccharolyticum_K10                             1.735572
## 
## $lib.size
## Sample_01_1 
##      150144

10 Heatmap using pstat data

Lets create a heatmap from our pstat object. We siply use the phyloseq function plot_heatmap.

#select a tax level for the heatmap plot
taxonLevel<-"class"
physeq <- tax_glom(pstat,taxonLevel)
plot_heatmap(physeq, taxa.label=taxonLevel)