This tutorial describes Phantasus – a web-application for visual and interactive gene expression analysis. Phantasus is based on Morpheus – a web-based software for heatmap visualisation and analysis, which was integrated with an R environment via OpenCPU API.
The main object in Phantasus is a gene expression matrix. It can either be uploaded from a local text or Excel file or loaded from Gene Expression Omnibus (GEO) database by the series identifier (both microarray and RNA-seq datasets are supported). Aside from basic visualization and filtering methods as implemented in Morpheus, R-based methods such as k-means clustering, principal component analysis, differential expression analysis with limma package are supported.
In this vignette we show example usage of Phantasus for analysis of public gene expression data from GEO database. It starts from loading data, normalization and filtering outliers, to doing differential gene expression analysis and downstream analysis.
To illustrate the usage of Phantasus let us consider public dataset from Gene Expression Omnibus (GEO) database GSE53986. This dataset contains data from experiments, where bone marrow derived macrophages were treated with three stimuli: LPS, IFNg and combined LPS+INFg.
The simplest way to try Phantasus application is to go to web-site https://genome.ifmo.ru/phantasus or its mirror https://artyomovlab.wustl.edu/phantasus where the latest version is deployed.
Alternatively, Phantaus can be start locally using the corresponding R package:
library(phantasus)
servePhantasus()
This command runs the application with the default parameters,
opens it in the default browser (from browser
option)
with address http://0.0.0.0:8000.
The starting screen should appear:
Opening the dataset
Let us open the dataset. To do this, select GEO Datasets option
in Choose a file… dropdown menu. There, a text field will appear
where GSE53986
should be entered. Clicking the Load button
(or pressing Enter on the keyboard) will start the loading.
After a few seconds, the corresponding heatmap should appear.
On the heatmap, the rows correspond to genes (or microarray probes). The rows are annotated with Gene symbol and Gene ID annotaions (as loaded from GEO database). Columns correspond to samples. They are annotated with titles, GEO sample accession identifiers and treatment field. The annotations, such as treatment, are loaded from user-submitted GEO annotations (they can be seen, for example, in Charateristics secion at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1304836). We note that not for all of the datasets in GEO such proper annotations are supplied.
Adjusting expression values
By hovering at heatmap cell, gene expression values can be viewed. The large values there indicate that the data is not log-scaled, which is important for most types of gene expression analysis.
For the proper further analysis it is recommended to normalize the matrix. To normalize values go to Tools/Adjust menu and check Log 2 and Quantile normalize adjustments.