xSNP2cGenes | R Documentation |
xSNP2cGenes
is supposed to define HiC genes given a list of
SNPs. The HiC weight is calcualted as Cumulative Distribution Function
of HiC interaction scores.
xSNP2cGenes(data, entity = c("SNP", "chr:start-end", "data.frame", "bed", "GRanges"), include.HiC = c(NA, "Monocytes", "Macrophages_M0", "Macrophages_M1", "Macrophages_M2", "Neutrophils", "Megakaryocytes", "Endothelial_precursors", "Erythroblasts", "Fetal_thymus", "Naive_CD4_T_cells", "Total_CD4_T_cells", "Activated_total_CD4_T_cells", "Nonactivated_total_CD4_T_cells", "Naive_CD8_T_cells", "Total_CD8_T_cells", "Naive_B_cells", "Total_B_cells", "PE.Monocytes", "PE.Macrophages_M0", "PE.Macrophages_M1", "PE.Macrophages_M2", "PE.Neutrophils", "PE.Megakaryocytes", "PE.Erythroblasts", "PE.Naive_CD4_T_cells", "PE.Naive_CD8_T_cells"), GR.SNP = c("dbSNP_GWAS", "dbSNP_Common"), cdf.function = c("empirical", "exponential"), plot = FALSE, verbose = TRUE, RData.location = "http://galahad.well.ox.ac.uk/bigdata")
data |
NULL or a input vector containing SNPs. If NULL, all SNPs will be considered. If a input vector containing SNPs, SNPs should be provided as dbSNP ID (ie starting with rs) or in the format of 'chrN:xxx', where N is either 1-22 or X, xxx is number; for example, 'chr16:28525386'. Alternatively, it can be other formats/entities (see the next parameter 'entity') |
entity |
the data entity. By default, it is "SNP". For general use, it can also be one of "chr:start-end", "data.frame", "bed" or "GRanges" |
include.HiC |
genes linked to input SNPs are also included. By default, it is 'NA' to disable this option. Otherwise, those genes linked to SNPs will be included according to Promoter Capture HiC (PCHiC) datasets. Pre-built HiC datasets are detailed in the section 'Note' |
GR.SNP |
the genomic regions of SNPs. By default, it is 'dbSNP_GWAS', that is, SNPs from dbSNP (version 146) restricted to GWAS SNPs and their LD SNPs (hg19). It can be 'dbSNP_Common', that is, Common SNPs from dbSNP (version 146) plus GWAS SNPs and their LD SNPs (hg19). Alternatively, the user can specify the customised input. To do so, first save your RData file (containing an GR object) into your local computer, and make sure the GR object content names refer to dbSNP IDs. Then, tell "GR.SNP" with your RData file name (with or without extension), plus specify your file RData path in "RData.location". Note: you can also load your customised GR object directly |
cdf.function |
a character specifying a Cumulative Distribution Function (cdf). It can be one of 'exponential' based on exponential cdf, 'empirical' for empirical cdf |
plot |
logical to indicate whether the histogram plot (plus density or CDF plot) should be drawn. By default, it sets to false for no plotting |
verbose |
logical to indicate whether the messages will be displayed in the screen. By default, it sets to true for display |
RData.location |
the characters to tell the location of built-in
RData files. See |
a data frame with following columns:
Gene
: SNP-interacting genes caputured by HiC
SNP
: SNPs
Sig
: the interaction score (the higher stronger)
Weight
: the HiC weight
Pre-built HiC datasets are described below according to the data
sources.
1. Promoter Capture HiC datasets in 17 primary blood cell types.
Sourced from Cell 2016, 167(5):1369-1384.e19
Monocytes
: physical interactions (CHiCAGO score >=5) of
promoters (baits) with the other end (preys) in Monocytes.
Macrophages_M0
: promoter interactomes in Macrophages M0.
Macrophages_M1
: promoter interactomes in Macrophages M1.
Macrophages_M2
: promoter interactomes in Macrophages M2.
Neutrophils
: promoter interactomes in Neutrophils.
Megakaryocytes
: promoter interactomes in Megakaryocytes.
Endothelial_precursors
: promoter interactomes in
Endothelial precursors.
Fetal_thymus
: promoter interactomes in Fetal thymus.
Naive_CD4_T_cells
: promoter interactomes in Naive CD4+ T
cells.
Total_CD4_T_cells
: promoter interactomes in Total CD4+ T
cells.
Activated_total_CD4_T_cells
: promoter interactomes in
Activated total CD4+ T cells.
Nonactivated_total_CD4_T_cells
: promoter interactomes in
Nonactivated total CD4+ T cells.
Naive_CD8_T_cells
: promoter interactomes in Naive CD8+ T
cells.
Total_CD8_T_cells
: promoter interactomes in Total CD8+ T
cells.
Naive_B_cells
: promoter interactomes in Naive B cells.
Total_B_cells
: promoter interactomes in Total B cells.
2. Promoter Capture HiC datasets (involving active promoters and enhancers) in 9 primary blood cell types. Sourced from Cell 2016, 167(5):1369-1384.e19
PE.Monocytes
: physical interactions (CHiCAGO score >=5) of
promoters (baits) with the other end (enhancers as preys) in
Monocytes.
PE.Macrophages_M0
: promoter-enhancer interactomes in
Macrophages M0.
PE.Macrophages_M1
: promoter-enhancer interactomes in
Macrophages M1.
PE.Macrophages_M2
: promoter-enhancer interactomes in
Macrophages M2.
PE.Neutrophils
: promoter-enhancer interactomes in
Neutrophils.
PE.Megakaryocytes
: promoter-enhancer interactomes in
Megakaryocytes.
PE.Erythroblasts
: promoter-enhancer interactomes in
Erythroblasts.
PE.Naive_CD4_T_cells
: promoter-enhancer interactomes in
Naive CD4+ T cells.
PE.Naive_CD8_T_cells
: promoter-enhancer interactomes in
Naive CD8+ T cells.
xSNPhic
## Not run: # Load the library library(Pi) ## End(Not run) RData.location <- "http://galahad.well.ox.ac.uk/bigdata_dev" # a) provide the SNPs with the significance info ## get lead SNPs reported in AS GWAS and their significance info (p-values) #data.file <- "http://galahad.well.ox.ac.uk/bigdata/AS.txt" #AS <- read.delim(data.file, header=TRUE, stringsAsFactors=FALSE) ImmunoBase <- xRDataLoader(RData.customised='ImmunoBase', RData.location=RData.location) data <- names(ImmunoBase$AS$variants) ## Not run: # b) define HiC genes df_cGenes <- xSNP2cGenes(data, include.HiC="Monocytes", RData.location=RData.location) ## End(Not run)