RNAmodR 1.2.3
For users interested in the general aspect of any RNAmodR
based package please
have a look at the main vignette of the package.
This vignette is aimed at developers and researchers, who want to use the
functionality of the RNAmodR
package to develop a new modification strategy
based on high throughput sequencing data.
library(RNAmodR)
Two classes have to be considered to establish a new analysis pipeline using
RNAmodR
. These are the SequenceData
and the Modifier
class.
SequenceData
classFirst, the SequenceData
class has to be considered. Several classes are
already implemented, which are:
End5SequenceData
End3SequenceData
EndSequenceData
ProtectedEndSequenceData
CoverageSequenceData
PileupSequenceData
NormEnd5SequenceData
NormEnd3SequenceData
If these cannot be reused, a new class can be implemented quite easily. First
the DataFrame class, the Data class and a constructor has to defined. The only
value, which has to be provided, is a default minQuality
integer value and
some basic information.
setClass(Class = "ExampleSequenceDataFrame",
contains = "SequenceDFrame")
ExampleSequenceDataFrame <- function(df, ranges, sequence, replicate,
condition, bamfiles, seqinfo){
RNAmodR:::.SequenceDataFrame("Example",df, ranges, sequence, replicate,
condition, bamfiles, seqinfo)
}
setClass(Class = "ExampleSequenceData",
contains = "SequenceData",
slots = c(unlistData = "ExampleSequenceDataFrame"),
prototype = list(unlistData = ExampleSequenceDataFrame(),
unlistType = "ExampleSequenceDataFrame",
minQuality = 5L,
dataDescription = "Example data"))
ExampleSequenceData <- function(bamfiles, annotation, sequences, seqinfo, ...){
RNAmodR:::SequenceData("Example", bamfiles = bamfiles,
annotation = annotation, sequences = sequences,
seqinfo = seqinfo, ...)
}
Second, the getData
function has to be implemented. This is used to load
the data from a bam file and must return a named list IntegerList
,
NumericList
or CompressedSplitDataFrameList
per file.
setMethod("getData",
signature = c(x = "ExampleSequenceData",
bamfiles = "BamFileList",
grl = "GRangesList",
sequences = "XStringSet",
param = "ScanBamParam"),
definition = function(x, bamfiles, grl, sequences, param, args){
###
}
)
Third, the aggregate
function has to be implemented. This function is used to
aggregate data over replicates for all or one of the conditions. The resulting
data is passed on to the Modifier
class.
setMethod("aggregateData",
signature = c(x = "ExampleSequenceData"),
function(x, condition = c("Both","Treated","Control")){
###
}
)
Modifier
classA new Modifier
class is probably the main class, which needs to be
implemented. Three variable have to be set. mod
must be a single element from
the Modstrings::shortName(Modstrings::ModRNAString())
. score
is the default
score, which is used for several function. A column with this name should be
returned from the aggregate
function. dataType
defines the SequenceData
class to be used. dataType
can contain multiple names of a SequenceData
class, which are then combined to form a SequenceDataSet
.
setClass("ModExample",
contains = c("RNAModifier"),
prototype = list(mod = "X",
score = "score",
dataType = "ExampleSequenceData"))
ModExample <- function(x, annotation, sequences, seqinfo, ...){
RNAmodR:::Modifier("ModExample", x = x, annotation = annotation,
sequences = sequences, seqinfo = seqinfo, ...)
}
dataType
can also be a list
of character
vectors, which leads then to the
creation of SequenceDataList
. However, for now this is a hypothetical case and
should only be used, if the detection of a modification requires bam files from
two or more different methods to be used to detect one modification.
The settings<-
function can be amended to save specifc settings (
.norm_example_args
must be defined seperatly to normalize input arguments in
any way one sees fit).
setReplaceMethod(f = "settings",
signature = signature(x = "ModExample"),
definition = function(x, value){
x <- callNextMethod()
# validate special setting here
x@settings[names(value)] <- unname(.norm_example_args(value))
x
})
The aggregateData
function is used to take the aggregated data from the
SequenceData
object and to calculate the specific scores, which are then
stored in the aggregate
slot.
setMethod(f = "aggregateData",
signature = signature(x = "ModExample"),
definition =
function(x, force = FALSE){
# Some data with element per transcript
}
)
The findMod
function takes the aggregate data and searches for modifications,
which are then returned as a GRanges object and stored in the modifications
slot.
setMethod("findMod",
signature = c(x = "ModExample"),
function(x){
# an element per modification found.
}
)
ModifierSet
classThe ModifierSet
class is implemented very easily by defining the class and
the constructor. The functionality is defined by the Modifier
class.
setClass("ModSetExample",
contains = "ModifierSet",
prototype = list(elementType = "ModExample"))
ModSetExample <- function(x, annotation, sequences, seqinfo, ...){
RNAmodR:::ModifierSet("ModExample", x = x, annotation = annotation,
sequences = sequences, seqinfo = seqinfo, ...)
}
Additional functions, which need to be implemented, are getDataTrack
for the
new SequenceData
and new Modifier
classes and
plotData
/plotDataByCoord
for the new Modifier
and ModifierSet
classes. name
defines a transcript name found in names(ranges(x))
and
type
is the data type typically found as a column in the aggregate
slot.
setMethod(
f = "getDataTrack",
signature = signature(x = "ExampleSequenceData"),
definition = function(x, name, ...) {
###
}
)
setMethod(
f = "getDataTrack",
signature = signature(x = "ModExample"),
definition = function(x, name, type, ...) {
}
)
setMethod(
f = "plotDataByCoord",
signature = signature(x = "ModExample", coord = "GRanges"),
definition = function(x, coord, type = "score", window.size = 15L, ...) {
}
)
setMethod(
f = "plotData",
signature = signature(x = "ModExample"),
definition = function(x, name, from, to, type = "score", ...) {
}
)
setMethod(
f = "plotDataByCoord",
signature = signature(x = "ModSetExample", coord = "GRanges"),
definition = function(x, coord, type = "score", window.size = 15L, ...) {
}
)
setMethod(
f = "plotData",
signature = signature(x = "ModSetExample"),
definition = function(x, name, from, to, type = "score", ...) {
}
)
If unsure, how to modify these functions, have a look a the code in the
Modifier-Inosine-viz.R
file of this package.
As suggested directly above, for a more detailed example have a look at the
ModInosine
class source code found in the Modifier-Inosine-class.R
and
Modifier-Inosine-viz.R
files of this package.
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.11-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.11-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] RNAmodR_1.2.3 Modstrings_1.4.1 RNAmodR.Data_1.2.0
## [4] ExperimentHubData_1.14.1 AnnotationHubData_1.18.1 futile.logger_1.4.3
## [7] ExperimentHub_1.14.2 AnnotationHub_2.20.2 BiocFileCache_1.12.1
## [10] dbplyr_1.4.4 GenomicFeatures_1.40.1 AnnotationDbi_1.50.3
## [13] Biobase_2.48.0 Rsamtools_2.4.0 Biostrings_2.56.0
## [16] XVector_0.28.0 rtracklayer_1.48.0 GenomicRanges_1.40.0
## [19] GenomeInfoDb_1.24.2 IRanges_2.22.2 S4Vectors_0.26.1
## [22] BiocGenerics_0.34.0 BiocStyle_2.16.0
##
## loaded via a namespace (and not attached):
## [1] backports_1.1.9 Hmisc_4.4-1
## [3] plyr_1.8.6 lazyeval_0.2.2
## [5] splines_4.0.2 BiocParallel_1.22.0
## [7] ggplot2_3.3.2 digest_0.6.25
## [9] ensembldb_2.12.1 htmltools_0.5.0
## [11] magick_2.4.0 magrittr_1.5
## [13] checkmate_2.0.0 memoise_1.1.0
## [15] BSgenome_1.56.0 cluster_2.1.0
## [17] ROCR_1.0-11 matrixStats_0.56.0
## [19] askpass_1.1 prettyunits_1.1.1
## [21] jpeg_0.1-8.1 colorspace_1.4-1
## [23] blob_1.2.1 rappdirs_0.3.1
## [25] xfun_0.16 dplyr_1.0.2
## [27] crayon_1.3.4 RCurl_1.98-1.2
## [29] jsonlite_1.7.0 graph_1.66.0
## [31] VariantAnnotation_1.34.0 survival_3.2-3
## [33] glue_1.4.2 gtable_0.3.0
## [35] zlibbioc_1.34.0 DelayedArray_0.14.1
## [37] scales_1.1.1 futile.options_1.0.1
## [39] DBI_1.1.0 Rcpp_1.0.5
## [41] xtable_1.8-4 progress_1.2.2
## [43] htmlTable_2.0.1 foreign_0.8-80
## [45] bit_4.0.4 OrganismDbi_1.30.0
## [47] Formula_1.2-3 AnnotationForge_1.30.1
## [49] htmlwidgets_1.5.1 httr_1.4.2
## [51] getopt_1.20.3 RColorBrewer_1.1-2
## [53] ellipsis_0.3.1 farver_2.0.3
## [55] pkgconfig_2.0.3 XML_3.99-0.5
## [57] Gviz_1.32.0 nnet_7.3-14
## [59] labeling_0.3 reshape2_1.4.4
## [61] tidyselect_1.1.0 rlang_0.4.7
## [63] later_1.1.0.1 munsell_0.5.0
## [65] biocViews_1.56.2 BiocVersion_3.11.1
## [67] tools_4.0.2 generics_0.0.2
## [69] RSQLite_2.2.0 evaluate_0.14
## [71] stringr_1.4.0 fastmap_1.0.1
## [73] yaml_2.2.1 knitr_1.29
## [75] bit64_4.0.5 purrr_0.3.4
## [77] AnnotationFilter_1.12.0 RBGL_1.64.0
## [79] mime_0.9 formatR_1.7
## [81] biomaRt_2.44.1 compiler_4.0.2
## [83] rstudioapi_0.11 curl_4.3
## [85] png_0.1-7 interactiveDisplayBase_1.26.3
## [87] tibble_3.0.3 stringi_1.4.6
## [89] highr_0.8 lattice_0.20-41
## [91] ProtGenerics_1.20.0 Matrix_1.2-18
## [93] vctrs_0.3.4 stringdist_0.9.6
## [95] pillar_1.4.6 BiocCheck_1.24.0
## [97] lifecycle_0.2.0 RUnit_0.4.32
## [99] optparse_1.6.6 BiocManager_1.30.10
## [101] data.table_1.13.0 bitops_1.0-6
## [103] colorRamps_2.3 httpuv_1.5.4
## [105] R6_2.4.1 latticeExtra_0.6-29
## [107] bookdown_0.20 promises_1.1.1
## [109] gridExtra_2.3 codetools_0.2-16
## [111] lambda.r_1.2.4 dichromat_2.0-0
## [113] assertthat_0.2.1 SummarizedExperiment_1.18.2
## [115] openssl_1.4.2 rBiopaxParser_2.28.0
## [117] GenomicAlignments_1.24.0 GenomeInfoDbData_1.2.3
## [119] hms_0.5.3 grid_4.0.2
## [121] rpart_4.1-15 rmarkdown_2.3
## [123] biovizBase_1.36.0 shiny_1.5.0
## [125] base64enc_0.1-3