fgsea
is an R-package for fast preranked gene set enrichment analysis (GSEA). The performance is achieved by using an algorithm for cumulative GSEA-statistic calculation. This allows to reuse samples between different gene set sizes. See the preprint for algorithmic details.
Loading example pathways and gene-level statistics:
Running fgsea:
fgseaRes <- fgsea(pathways = examplePathways,
stats = exampleRanks,
minSize=15,
maxSize=500,
nperm=10000)
The resulting table contains enrichment scores and p-values:
## pathway pval padj ES
## 1: 5990980_Cell_Cycle 0.0001233806 0.001761341 0.5388497
## 2: 5990979_Cell_Cycle,_Mitotic 0.0001271294 0.001761341 0.5594755
## 3: 5991210_Signaling_by_Rho_GTPases 0.0001325557 0.001761341 0.4238512
## 4: 5991454_M_Phase 0.0001372684 0.001761341 0.5576247
## 5: 5991209_RHO_GTPase_Effectors 0.0001392370 0.001761341 0.5248796
## 6: 5991023_Metabolism_of_carbohydrates 0.0001392564 0.001761341 0.4944766
## NES nMoreExtreme size leadingEdge
## 1: 2.675375 0 369 66336,66977,12442,107995,66442,19361,...
## 2: 2.732696 0 317 66336,66977,12442,107995,66442,12571,...
## 3: 2.008431 0 231 66336,66977,20430,104215,233406,107995,...
## 4: 2.552111 0 173 66336,66977,12442,107995,66442,52276,...
## 5: 2.368318 0 157 66336,66977,20430,104215,233406,107995,...
## 6: 2.234572 0 160 11676,21991,15366,58250,12505,20527,...
It takes about ten seconds to get results with significant hits after FDR correction:
## [1] 83
One can make an enrichment plot for a pathway:
plotEnrichment(examplePathways[["5991130_Programmed_Cell_Death"]],
exampleRanks) + labs(title="Programmed Cell Death")
Or make a table plot for a bunch of selected pathways:
topPathwaysUp <- fgseaRes[ES > 0][head(order(pval), n=10), pathway]
topPathwaysDown <- fgseaRes[ES < 0][head(order(pval), n=10), pathway]
topPathways <- c(topPathwaysUp, rev(topPathwaysDown))
plotGseaTable(examplePathways[topPathways], exampleRanks, fgseaRes,
gseaParam = 0.5)
From the plot above one can see that there are very similar pathways in the table (for example 5991502_Mitotic_Metaphase_and_Anaphase
and 5991600_Mitotic_Anaphase
). To select only independent pathways one can use collapsePathways
function:
collapsedPathways <- collapsePathways(fgseaRes[order(pval)][padj < 0.01],
examplePathways, exampleRanks)
mainPathways <- fgseaRes[pathway %in% collapsedPathways$mainPathways][
order(-NES), pathway]
plotGseaTable(examplePathways[mainPathways], exampleRanks, fgseaRes,
gseaParam = 0.5)
To save the results data:table::fwrite
function can be used:
To make leading edge more human-readable it can be converted using mapIds
function and a corresponding database (here org.Mm.eg.db
for mouse):
Please, be aware that fgsea
function takes about O(nk^{3/2}) time, where n is number of permutations and k is a maximal size of the pathways. That means that setting maxSize
parameter with a value of ~500 is strongly recommended.
Also, fgsea
is parallelized using BiocParallel
package. By default the first registered backend returned by bpparam()
is used. To tweak the parallelization one can either specify BPPARAM
parameter used for bclapply
of set nproc
parameter, which is a shorthand for setting BPPARAM=MulticoreParam(workers = nproc)
.
For convenience there is reactomePathways
function that obtains pathways from Reactome for given set of genes. Package reactome.db
is required to be installed.
pathways <- reactomePathways(names(exampleRanks))
fgseaRes <- fgsea(pathways, exampleRanks, nperm=1000, maxSize=500)
head(fgseaRes)
## pathway
## 1: 5-Phosphoribose 1-diphosphate biosynthesis
## 2: A tetrasaccharide linker sequence is required for GAG synthesis
## 3: ABC transporters in lipid homeostasis
## 4: ABC-family proteins mediated transport
## 5: ADP signalling through P2Y purinoceptor 1
## 6: ADP signalling through P2Y purinoceptor 12
## pval padj ES NES nMoreExtreme size
## 1: 0.87425150 0.9503676 0.4267378 0.6841911 437 2
## 2: 0.66123188 0.8655731 0.3218180 0.8362007 364 11
## 3: 0.32747253 0.6753315 -0.3850118 -1.1274080 148 13
## 4: 0.36407767 0.7020394 0.2709658 1.0697336 224 66
## 5: 0.01492537 0.1134767 0.6228710 1.7451397 7 16
## 6: 0.02925046 0.1724998 0.5959844 1.6123562 15 13
## leadingEdge
## 1: 328099,19139
## 2: 14733,20971,20970,12032,29873,218271,...
## 3: 19299,27403,11307,11806,217265,27409,...
## 4: 18669,56199,17463,26444,19179,228769,...
## 5: 14696,14702,14700,14682,14676,66066,...
## 6: 14696,14702,14700,66066,14697,14693,...
One can also start from .rnk
and .gmt
files as in original GSEA:
rnk.file <- system.file("extdata", "naive.vs.th1.rnk", package="fgsea")
gmt.file <- system.file("extdata", "mouse.reactome.gmt", package="fgsea")
Loading ranks:
ranks <- read.table(rnk.file,
header=TRUE, colClasses = c("character", "numeric"))
ranks <- setNames(ranks$t, ranks$ID)
str(ranks)
## Named num [1:12000] -63.3 -49.7 -43.6 -41.5 -33.3 ...
## - attr(*, "names")= chr [1:12000] "170942" "109711" "18124" "12775" ...
Loading pathways:
## List of 6
## $ 1221633_Meiotic_Synapsis : chr [1:64] "12189" "13006" "15077" "15078" ...
## $ 1368092_Rora_activates_gene_expression : chr [1:9] "11865" "12753" "12894" "18143" ...
## $ 1368110_Bmal1:Clock,Npas2_activates_circadian_gene_expression : chr [1:16] "11865" "11998" "12753" "12952" ...
## $ 1445146_Translocation_of_Glut4_to_the_Plasma_Membrane : chr [1:55] "11461" "11465" "11651" "11652" ...
## $ 186574_Endocrine-committed_Ngn3+_progenitor_cells : chr [1:4] "18012" "18088" "18506" "53626"
## $ 186589_Late_stage_branching_morphogenesis_pancreatic_bud_precursor_cells: chr [1:4] "11925" "15205" "21410" "246086"
And runnig fgsea:
## pathway
## 1: 1221633_Meiotic_Synapsis
## 2: 1445146_Translocation_of_Glut4_to_the_Plasma_Membrane
## 3: 442533_Transcriptional_Regulation_of_Adipocyte_Differentiation_in_3T3-L1_Pre-adipocytes
## 4: 508751_Circadian_Clock
## 5: 5334727_Mus_musculus_biological_processes
## 6: 573389_NoRC_negatively_regulates_rRNA_expression
## pval padj ES NES nMoreExtreme size
## 1: 0.5314092 0.6994718 0.2885754 0.9463425 312 27
## 2: 0.6796748 0.8229120 0.2387284 0.8566766 417 39
## 3: 0.1023018 0.2467031 -0.3640706 -1.3555837 39 31
## 4: 0.8069853 0.8900458 0.2516324 0.7329488 438 17
## 5: 0.3328592 0.5265049 0.2469065 1.0790587 233 106
## 6: 0.4191176 0.6064270 0.3607407 1.0507568 227 17
## leadingEdge
## 1: 15270,12189,71846,19357
## 2: 17918,19341,20336,22628,22627,20619,...
## 3: 76199,19014,26896,229003,17977,17978,...
## 4: 20893,59027,19883
## 5: 60406,19361,15270,20893,12189,68240,...
## 6: 60406,20018,245688,20017