xPierGSEA | R Documentation |
xPierGSEA
is supposed to prioritise pathways given prioritised
genes and the ontology in query. It is done via gene set enrichment
analysis (GSEA). It returns an object of class "eGSEA".
xPierGSEA(pNode, priority.top = NULL, ontology = c("GOBP", "GOMF", "GOCC", "PS", "PS2", "SF", "Pfam", "DO", "HPPA", "HPMI", "HPCM", "HPMA", "MP", "MsigdbH", "MsigdbC1", "MsigdbC2CGP", "MsigdbC2CPall", "MsigdbC2CP", "MsigdbC2KEGG", "MsigdbC2REACTOME", "MsigdbC2BIOCARTA", "MsigdbC3TFT", "MsigdbC3MIR", "MsigdbC4CGN", "MsigdbC4CM", "MsigdbC5BP", "MsigdbC5MF", "MsigdbC5CC", "MsigdbC6", "MsigdbC7", "DGIdb", "GTExV4", "GTExV6", "CreedsDisease", "CreedsDiseaseUP", "CreedsDiseaseDN", "CreedsDrug", "CreedsDrugUP", "CreedsDrugDN", "CreedsGene", "CreedsGeneUP", "CreedsGeneDN"), customised.genesets = NULL, size.range = c(10, 500), path.mode = c("all_paths", "shortest_paths", "all_shortest_paths"), weight = 1, nperm = 2000, fast = TRUE, verbose = TRUE, RData.location = "http://galahad.well.ox.ac.uk/bigdata")
pNode |
an object of class "pNode" (or "pTarget" or "dTarget") |
priority.top |
the number of the top targets used for GSEA. By default, it is NULL meaning all targets are used |
ontology |
the ontology supported currently. It can be "GOBP" for Gene Ontology Biological Process, "GOMF" for Gene Ontology Molecular Function, "GOCC" for Gene Ontology Cellular Component, "PS" for phylostratific age information, "PS2" for the collapsed PS version (inferred ancestors being collapsed into one with the known taxonomy information), "SF" for domain superfamily assignments, "DO" for Disease Ontology, "HPPA" for Human Phenotype Phenotypic Abnormality, "HPMI" for Human Phenotype Mode of Inheritance, "HPCM" for Human Phenotype Clinical Modifier, "HPMA" for Human Phenotype Mortality Aging, "MP" for Mammalian Phenotype, and Drug-Gene Interaction database (DGIdb) for drugable categories, and the molecular signatures database (Msigdb, including "MsigdbH", "MsigdbC1", "MsigdbC2CGP", "MsigdbC2CPall", "MsigdbC2CP", "MsigdbC2KEGG", "MsigdbC2REACTOME", "MsigdbC2BIOCARTA", "MsigdbC3TFT", "MsigdbC3MIR", "MsigdbC4CGN", "MsigdbC4CM", "MsigdbC5BP", "MsigdbC5MF", "MsigdbC5CC", "MsigdbC6", "MsigdbC7") |
customised.genesets |
a list each containing gene symbols. By default, it is NULL. If the list provided, it will overtake the previous parameter "ontology" |
size.range |
the minimum and maximum size of members of each term in consideration. By default, it sets to a minimum of 10 but no more than 500 |
path.mode |
the mode of paths induced by vertices/nodes with input annotation data. It can be "all_paths" for all possible paths to the root, "shortest_paths" for only one path to the root (for each node in query), "all_shortest_paths" for all shortest paths to the root (i.e. for each node, find all shortest paths with the equal lengths) |
weight |
an integer specifying score weight. It can be "0" for unweighted (an equivalent to Kolmogorov-Smirnov, only considering the rank), "1" for weighted by input gene score (by default), and "2" for over-weighted, and so on |
nperm |
the number of random permutations. For each permutation, gene-score associations will be permutated so that permutation of gene-term associations is realised |
fast |
logical to indicate whether to fast calculate GSEA
resulting. By default, it sets to true, but not necessarily does so. It
will depend on whether the package "fgsea" has been installed. It can
be installed via: |
verbose |
logical to indicate whether the messages will be displayed in the screen. By default, it sets to false for no display |
RData.location |
the characters to tell the location of built-in
RData files. See |
an object of class "eGSEA", a list with following components:
df_summary
: a data frame of nTerm x 9 containing gene set
enrichment analysis result, where nTerm is the number of
terms/genesets, and the 9 columns are "setID" (i.e. "Term ID"), "name"
(i.e. "Term Name"), "nAnno" (i.e. number in members annotated by a
term), "nLead" (i.e. number in members as leading genes), "es" (i.e.
enrichment score), "nes" (i.e. normalised enrichment score; enrichment
score but after being normalised by gene set size), "pvalue" (i.e.
nominal p value), "adjp" (i.e. adjusted p value; p value but after
being adjusted for multiple comparisons), "distance" (i.e. term
distance or metadata)
leading
: a list of gene sets, each storing leading gene
info (i.e. the named vector with names for gene symbols and elements
for priority rank). Always, gene sets are identified by "setID"
leading
: a list of gene sets, each storing full info on
gene set enrichment analysis result (i.e. a data frame of nGene x 5,
where nGene is the number of genes, and the 5 columns are "GeneID",
"Rank" for priority rank, "Score" for priority score, "RES" for running
enrichment score, and "Hits" for gene set hits info with 1 for gene
hit, 2 for leading gene hit, 3 for the point defining leading genes, 0
for no hit). Always, gene sets are identified by "setID"
none
xGSEAbarplot
, xGSEAdotplot
## Not run: # Load the library library(Pi) ## End(Not run) RData.location <- "http://galahad.well.ox.ac.uk/bigdata_dev" ## Not run: # a) provide the seed nodes/genes with the weight info ## load ImmunoBase ImmunoBase <- xRDataLoader(RData.customised='ImmunoBase', RData.location=RData.location) ## get genes within 500kb away from AS GWAS lead SNPs seeds.genes <- ImmunoBase$AS$genes_variants ## seeds weighted according to distance away from lead SNPs data <- 1- seeds.genes/500000 # b) perform priority analysis pNode <- xPierGenes(data=data, network="PCommonsDN_medium",restart=0.7, RData.location=RData.location) # c) do pathway-level priority using GSEA eGSEA <- xPierGSEA(pNode=pNode, ontology="DGIdb", nperm=2000, RData.location=RData.location) bp <- xGSEAbarplot(eGSEA, top_num="auto", displayBy="nes") gp <- xGSEAdotplot(eGSEA, top=1) ## End(Not run)