cliProfiler 1.6.0
Cross-linking immunoprecipitation (CLIP) is a technique that combines UV cross-linking and immunoprecipitation to analyse protein-RNA interactions or to pinpoint RNA modifications (e.g. m6A). CLIP-based methods, such as iCLIP and eCLIP, allow precise mapping of RNA modification sites or RNA-binding protein (RBP) binding sites on a genome-wide scale. These techniques help us to unravel post-transcriptional regulatory networks. In order to make the visualization of CLIP data easier, we develop cliProfiler package. The cliProfiler includes seven functions which allow users easily make different profile plots.
The cliProfiler package is available at
https://bioconductor.org and can be
installed via BiocManager::install
:
if (!require("BiocManager"))
install.packages("BiocManager")
BiocManager::install("cliProfiler")
A package only needs to be installed once. Load the package into an R session with
library(cliProfiler)
The input data for using all the functions in cliProfiler should
be the peak calling result or other similar object that represents the RBP
binding sites or RNA modification position. Moreover, these peaks/signals
be
stored in the GRanges object. The GRanges is an S4 class which defined
by GenomicRanges. The GRanges class is a container for the
genomic locations and their associated annotations. For more information about
GRanges objects please check GenomicRanges package. An example
of GRanges object is shown below:
testpath <- system.file("extdata", package = "cliProfiler")
## loading the test GRanges object
test <- readRDS(file.path(testpath, "test.rds"))
## Show an example of GRanges object
test
## GRanges object with 100 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr17 28748198-28748218 +
## [2] chr10 118860137-118860157 -
## [3] chr2 148684461-148684481 +
## [4] chr2 84602546-84602566 -
## [5] chr18 6111874-6111894 -
## ... ... ... ...
## [96] chr7 127254692-127254712 +
## [97] chr2 28833830-28833850 -
## [98] chr9 44607255-44607275 +
## [99] chr1 133621331-133621351 -
## [100] chr4 130316598-130316618 -
## -------
## seqinfo: 22 sequences from an unspecified genome; no seqlengths
The annotation file that required by functions exonProfile
,
geneTypeProfile
, intronProfile
, spliceSiteProfile
and metaGeneProfile
should be in the gff3
format and download from
https://www.gencodegenes.org/. In the
cliProfiler package, we include a test gff3
file.
## the path for the test gff3 file
test_gff3 <- file.path(testpath, "annotation_test.gff3")
## the gff3 file can be loaded by import.gff3 function in rtracklayer package
shown_gff3 <- rtracklayer::import.gff3(test_gff3)
## show the test gff3 file
shown_gff3
## GRanges object with 3068 ranges and 23 metadata columns:
## seqnames ranges strand | source type
## <Rle> <IRanges> <Rle> | <factor> <factor>
## [1] chr1 72159442-72212307 - | HAVANA transcript
## [2] chr1 72212017-72212307 - | HAVANA exon
## [3] chr1 72212017-72212111 - | HAVANA CDS
## [4] chr1 72212109-72212111 - | HAVANA start_codon
## [5] chr1 72192043-72192202 - | HAVANA exon
## ... ... ... ... . ... ...
## [3064] chrX 153392866-153392868 + | HAVANA stop_codon
## [3065] chrX 153237748-153238092 + | HAVANA five_prime_UTR
## [3066] chrX 153308852-153308924 + | HAVANA five_prime_UTR
## [3067] chrX 153370845-153370846 + | HAVANA five_prime_UTR
## [3068] chrX 153392869-153396132 + | HAVANA three_prime_UTR
## score phase ID gene_id
## <numeric> <integer> <character> <character>
## [1] NA <NA> ENSMUST00000048860.8 ENSMUSG00000039395.8
## [2] NA <NA> exon:ENSMUST00000048.. ENSMUSG00000039395.8
## [3] NA 0 CDS:ENSMUST000000488.. ENSMUSG00000039395.8
## [4] NA 0 start_codon:ENSMUST0.. ENSMUSG00000039395.8
## [5] NA <NA> exon:ENSMUST00000048.. ENSMUSG00000039395.8
## ... ... ... ... ...
## [3064] NA 0 stop_codon:ENSMUST00.. ENSMUSG00000041649.13
## [3065] NA <NA> UTR5:ENSMUST00000112.. ENSMUSG00000041649.13
## [3066] NA <NA> UTR5:ENSMUST00000112.. ENSMUSG00000041649.13
## [3067] NA <NA> UTR5:ENSMUST00000112.. ENSMUSG00000041649.13
## [3068] NA <NA> UTR3:ENSMUST00000112.. ENSMUSG00000041649.13
## gene_type gene_name level mgi_id
## <character> <character> <character> <character>
## [1] protein_coding Mreg 2 MGI:2151839
## [2] protein_coding Mreg 2 MGI:2151839
## [3] protein_coding Mreg 2 MGI:2151839
## [4] protein_coding Mreg 2 MGI:2151839
## [5] protein_coding Mreg 2 MGI:2151839
## ... ... ... ... ...
## [3064] protein_coding Klf8 2 MGI:2442430
## [3065] protein_coding Klf8 2 MGI:2442430
## [3066] protein_coding Klf8 2 MGI:2442430
## [3067] protein_coding Klf8 2 MGI:2442430
## [3068] protein_coding Klf8 2 MGI:2442430
## havana_gene Parent transcript_id
## <character> <CharacterList> <character>
## [1] OTTMUSG00000049069.1 ENSMUSG00000039395.8 ENSMUST00000048860.8
## [2] OTTMUSG00000049069.1 ENSMUST00000048860.8 ENSMUST00000048860.8
## [3] OTTMUSG00000049069.1 ENSMUST00000048860.8 ENSMUST00000048860.8
## [4] OTTMUSG00000049069.1 ENSMUST00000048860.8 ENSMUST00000048860.8
## [5] OTTMUSG00000049069.1 ENSMUST00000048860.8 ENSMUST00000048860.8
## ... ... ... ...
## [3064] OTTMUSG00000019377.5 ENSMUST00000112574.8 ENSMUST00000112574.8
## [3065] OTTMUSG00000019377.5 ENSMUST00000112574.8 ENSMUST00000112574.8
## [3066] OTTMUSG00000019377.5 ENSMUST00000112574.8 ENSMUST00000112574.8
## [3067] OTTMUSG00000019377.5 ENSMUST00000112574.8 ENSMUST00000112574.8
## [3068] OTTMUSG00000019377.5 ENSMUST00000112574.8 ENSMUST00000112574.8
## transcript_type transcript_name transcript_support_level
## <character> <character> <character>
## [1] protein_coding Mreg-201 1
## [2] protein_coding Mreg-201 1
## [3] protein_coding Mreg-201 1
## [4] protein_coding Mreg-201 1
## [5] protein_coding Mreg-201 1
## ... ... ... ...
## [3064] protein_coding Klf8-202 1
## [3065] protein_coding Klf8-202 1
## [3066] protein_coding Klf8-202 1
## [3067] protein_coding Klf8-202 1
## [3068] protein_coding Klf8-202 1
## tag havana_transcript
## <CharacterList> <character>
## [1] basic,appris_principal_1,CCDS OTTMUST00000125321.1
## [2] basic,appris_principal_1,CCDS OTTMUST00000125321.1
## [3] basic,appris_principal_1,CCDS OTTMUST00000125321.1
## [4] basic,appris_principal_1,CCDS OTTMUST00000125321.1
## [5] basic,appris_principal_1,CCDS OTTMUST00000125321.1
## ... ... ...
## [3064] alternative_5_UTR,basic,appris_principal_1,... OTTMUST00000046245.1
## [3065] alternative_5_UTR,basic,appris_principal_1,... OTTMUST00000046245.1
## [3066] alternative_5_UTR,basic,appris_principal_1,... OTTMUST00000046245.1
## [3067] alternative_5_UTR,basic,appris_principal_1,... OTTMUST00000046245.1
## [3068] alternative_5_UTR,basic,appris_principal_1,... OTTMUST00000046245.1
## protein_id ccdsid trans_len exon_number
## <character> <character> <character> <character>
## [1] ENSMUSP00000041878.7 CCDS15032.1 2284 <NA>
## [2] ENSMUSP00000041878.7 CCDS15032.1 2284 1
## [3] ENSMUSP00000041878.7 CCDS15032.1 2284 1
## [4] ENSMUSP00000041878.7 CCDS15032.1 2284 1
## [5] ENSMUSP00000041878.7 CCDS15032.1 2284 2
## ... ... ... ... ...
## [3064] ENSMUSP00000108193.2 CCDS30481.1 4752 7
## [3065] ENSMUSP00000108193.2 CCDS30481.1 4752 1
## [3066] ENSMUSP00000108193.2 CCDS30481.1 4752 2
## [3067] ENSMUSP00000108193.2 CCDS30481.1 4752 3
## [3068] ENSMUSP00000108193.2 CCDS30481.1 4752 7
## exon_id
## <character>
## [1] <NA>
## [2] ENSMUSE00000600755.2
## [3] ENSMUSE00000600755.2
## [4] ENSMUSE00000600755.2
## [5] ENSMUSE00000262166.1
## ... ...
## [3064] ENSMUSE00000692289.2
## [3065] ENSMUSE00000745002.1
## [3066] ENSMUSE00000692290.1
## [3067] ENSMUSE00000253395.2
## [3068] ENSMUSE00000692289.2
## -------
## seqinfo: 19 sequences from an unspecified genome; no seqlengths
The function windowProfile
allows users to find out the enrichment of peaks
against the customized annotation file. This customized annotation file should
be stored in the GRanges object.
metaGeneProfile()
outputs a meta profile, which shows the location of binding
sites or modification sites ( peaks/signals)
along transcript regions
(5’UTR, CDS and 3’UTR). The input of this function should be a GRanges
object.
Besides the GRanges
object, a path to the gff3
annotation file which
download from Gencode is required by
metaGeneProfile
.
The output of metaGeneProfile
is a List
objects. The List
one contains
the GRanges objects with the calculation result which can be used in different
ways later.
meta <- metaGeneProfile(object = test, annotation = test_gff3)
meta[[1]]
## GRanges object with 100 ranges and 5 metadata columns:
## seqnames ranges strand | center location
## <Rle> <IRanges> <Rle> | <integer> <character>
## [1] chr10 118860137-118860157 - | 118860147 CDS
## [2] chr2 84602546-84602566 - | 84602556 UTR3
## [3] chr18 6111874-6111894 - | 6111884 CDS
## [4] chr11 33213145-33213165 - | 33213155 UTR3
## [5] chr11 96819422-96819442 - | 96819432 CDS
## ... ... ... ... . ... ...
## [96] chr8 72222842-72222862 + | 72222852 NO
## [97] chr18 36648184-36648204 + | 36648194 CDS
## [98] chr8 105216021-105216041 + | 105216031 UTR3
## [99] chr7 127254692-127254712 + | 127254702 UTR3
## [100] chr9 44607255-44607275 + | 44607265 UTR5
## Gene_ID Transcript_ID Position
## <character> <character> <numeric>
## [1] ENSMUSG00000028630.9 ENSMUST00000004281.9 0.674444
## [2] ENSMUSG00000034101.14 ENSMUST00000067232.9 0.122384
## [3] ENSMUSG00000041225.16 ENSMUST00000077128.12 0.199836
## [4] ENSMUSG00000040594.19 ENSMUST00000102815.9 0.159303
## [5] ENSMUSG00000038615.17 ENSMUST00000107658.7 0.889039
## ... ... ... ...
## [96] Nan <NA> 5.0000000
## [97] ENSMUSG00000117942.1 ENSMUST00000140061.7 0.1694561
## [98] ENSMUSG00000031885.14 ENSMUST00000109392.8 0.0457421
## [99] ENSMUSG00000054716.4 ENSMUST00000052509.5 0.3978495
## [100] ENSMUSG00000032097.10 ENSMUST00000217034.1 0.5779817
## -------
## seqinfo: 22 sequences from an unspecified genome; no seqlengths
Here is an explanation of the metaData columns of the output GRanges objects:
peak/signal
belongs to.peak/signal
belongs.peak/signal
within the genomic
region. This value close to 0 means this peak located close to the 5’ end of
the genomic feature. The position value close to 1 means the peak close to the
3’ end of the genomic feature. Value 5 means this peaks can not be mapped to
any annotation.The List
two is the meta plot which in the ggplot
class. The user can use
all the functions from ggplot2
to change the detail of this plot.
library(ggplot2)
## For example if user want to have a new name for the plot
meta[[2]] + ggtitle("Meta Profile 2")
For the advance usage, the metaGeneProfile
provides two methods to calculate
the relative position. The first method return a relative position of the
peaks/signals
in the genomic feature without the introns. The second method
return a relative position value of the peak in the genomic feature with the
introns. With the parameter include_intron
we can easily shift between these
two methods. If the data is a polyA plus data, we will recommend you to set
include_intron = FALSE
.
meta <- metaGeneProfile(object = test, annotation = test_gff3,
include_intron = TRUE)
meta[[2]]
The group
option allows user to make a meta plot with multiple conditions.
Here is an example:
test$Treat <- c(rep("Treatment 1",50), rep("Treatment 2", 50))
meta <- metaGeneProfile(object = test, annotation = test_gff3,
group = "Treat")
meta[[2]]