pcaExplorer provides functionality for interactive visualization of RNA-seq datasets based on Principal Components Analysis. Such methods allow for quick information extraction and effective data exploration. A Shiny application encapsulates the whole analysis, and is developed to become a practical companion for any RNA-seq dataset. This app supports reproducible research with state saving and automated report generation.
pcaExplorer
on published datasetspcaExplorer
on synthetic datasets
Package: pcaExplorer
Authors: Federico Marini [aut, cre]
Version: 2.2.1
Compiled date: 2017-08-30
License: MIT + file LICENSE
pcaExplorer is an R package distributed as part of the Bioconductor project. To install the package, start R and enter:
source("http://bioconductor.org/biocLite.R")
biocLite("pcaExplorer")
If you prefer, you can install and use the development version, which can be retrieved via Github (https://github.com/federicomarini/pcaExplorer). To do so, use
library("devtools")
install_github("federicomarini/pcaExplorer")
Once pcaExplorer is installed, it can be loaded by the following command.
library("pcaExplorer")
pcaExplorer is a Bioconductor package containing a Shiny application for analyzing expression data in different conditions and experimental factors.
It is a general-purpose interactive companion tool for RNA-seq analysis, which guides the user in exploring the Principal Components of the data under inspection.
pcaExplorer provides tools and functionality to detect outlier samples, genes that show particular patterns, and additionally provides a functional interpretation of the principal components for further quality assessment and hypothesis generation on the input data.
Moreover, a novel visualization approach is presented to simultaneously assess the effect of more than one experimental factor on the expression levels.
Thanks to its interactive/reactive design, it is designed to become a practical companion to any RNA-seq dataset analysis, making exploratory data analysis accessible also to the bench biologist, while providing additional insight also for the experienced data analyst.
Starting from development version 1.1.3, pcaExplorer supports reproducible research with state saving and automated report generation. Each generated plot and table can be exported by simple mouse clicks on the dedicated buttons.
If you use pcaExplorer for your analysis, please cite it as here below:
citation("pcaExplorer")
To cite package 'pcaExplorer' in publications use:
Federico Marini (2017). pcaExplorer: Interactive Visualization
of RNA-seq Data Using a Principal Components Approach. R package
version 2.2.1. https://github.com/federicomarini/pcaExplorer
A BibTeX entry for LaTeX users is
@Manual{,
title = {pcaExplorer: Interactive Visualization of RNA-seq Data Using a Principal Components Approach},
author = {Federico Marini},
year = {2017},
note = {R package version 2.2.1},
url = {https://github.com/federicomarini/pcaExplorer},
}
After loading the package, the pcaExplorer app can be launched in different modes:
pcaExplorer(dds = dds, rlt = rlt)
, where dds
is a DESeqDataSet
object and rlt
is a DESeqTransform
object, which were created during an existing session for the analysis of an RNA-seq dataset with the DESeq2 package
pcaExplorer(dds = dds)
, where dds
is a DESeqDataSet
object. The rlt
object is automatically computed upon launch.
pcaExplorer(countmatrix = countmatrix, coldata = coldata)
, where countmatrix
is a count matrix, generated after assigning reads to features such as genes via tools such as HTSeq-count
or featureCounts
, and coldata
is a data frame containing the experimental covariates of the experiments, such as condition, tissue, cell line, run batch and so on.
pcaExplorer()
, and then subsequently uploading the count matrix and the covariates data frame through the user interface. These files need to be formatted as tab separated files, which is a common format for storing such count values.
Additional parameters and objects that can be provided to the main pcaExplorer function are:
pca2go
, which is an object created by the pca2go
function, which scans the genes with high loadings in each principal component and each direction, and looks for functions (such as GO Biological Processes) that are enriched above the background. The offline pca2go
function is based on the routines and algorithms of the topGO package, but as an alternative, this object can be computed live during the execution of the app exploiting the goana
function, provided by the limma package. Although this likely provides more general (and probably less informative) functions, it is a good compromise for obtaining a further data interpretation.
annotation
, a data frame object, with row.names
as gene identifiers (e.g. ENSEMBL ids) identical to the row names of the count matrix or dds
object, and an extra column gene_name
, containing e.g. HGNC-based gene symbols. This can be used for making information extraction easier, as ENSEMBL ids (a usual choice when assigning reads to features) do not provide an immediate readout for which gene they refer to. This can be either passed as a parameter when launching the app, or also uploaded as a tab separated text file. The package provides two functions, get_annotation
and get_annotation_orgdb
, as a convenient wrapper to obtain the updated annotation information, respectively from biomaRt
or via the org.XX.eg.db
packages.
The pcaExplorer app is structured in different panels, each focused on a different aspect of the data exploration.
Most of the panels work extensively with click-based and brush-based interactions, to gain additional depth in the explorations, for example by zooming, subsetting, selecting. This is possible thanks to the recent developments in the shiny package/framework.
The available panels are the described in the following subsections.
These file input controls are available when no dds
or countmatrix
+ coldata
are provided. Additionally, it is possible to upload the annotation
data frame.
When the objects are already passed as parameters, a brief overview/summary for them is displayed.
This is where you most likely are reading this text (otherwise in the package vignette).
Interactive tables for the raw, normalized or (r)log-transformed counts are shown in this tab. The user can also generate a sample-to-sample correlation scatter plot with the selected data.
This panel displays information on the objects in use, either passed as parameters or generated from the count matrix provided. Displayed information comprise the design metadata, a sample to sample distance heatmap, the number of million of reads per sample and some basic summary for the counts.