1 Introduction

Development of XNAString package aims at enabling efficient manipulation of modified oligonucleotide sequences. The package inherits some of the functionalities from Biostrings package. In contrary to Biostrings sequences, XNAString classes allow for description of base sequence, sugar and backbone in a single object. XNAString is able to capture single stranded oligonucleotides, siRNAs, PNAs, shRNAs, gRNAs and synthetic mRNAs, and enable users to apply sequence-manipulating Bioconductor packages to their analysis. XNAString can read and write a HELM notation, compute alphabet frequency, align and match targets.

2 Methods and functions overview

All exported methods are listed in this section. They are divided into four tables:

  • XNAString methods

  • XNAStringSet methods

  • Both XNAString and XNAStringSet methods

  • Other functions

2.1 XNAString methods

XNAString class methods
function. description
XNAString Create XNAString object by passing at least base (sugar, name, backbone, target, conjugates and dictionary are optional).
XNAReverseComplement Take XNAString object and return reverse complement of base slot.
predictMfeStructure Take XNAString object and apply RNAfold_MFE function from ViennaRNA package on base slot (if single stranded molecule).
predictDuplexStructure Take XNAString object and apply RNAcofold_MFE function from ViennaRNA package on base slot (if double stranded molecule) or duplicate base and apply RNAcofold_MFE function from ViennaRNA (if single stranded molecule)
XNAStringToHelm Change XNAString/XNAStringSet object to helm notation.

2.2 XNAStringSet methods

XNAStringSet class methods
function. description
XNAStringSet Create XNAStringSet object by passing XNAString objects list.
set2List Change XNAStringSet object to list of XNAString objects.
set2Dt Change XNAStringSet object to data.table.
dt2Set Change data.table (or data.frame) to XNAStringSet object
[ Extract single/multiple XNAString objects (XNAStringSet object returned) by passing index/indexes number or name/names.
[[ Extract single XNAString object (XNAString object returned) by passing index number or name.

2.3 Both XNAString and XNAStringSet methods

XNAString and XNAStringSet class methods
function. description
name / name<- Extract / overwrite name slot.
base / base<- Extract / overwrite base slot.
sugar / sugar<- Extract / overwrite sugar slot.
backbone / backbone<- Extract / overwrite backbone slot.
target / target<- Extract / overwrite target slot.
conjugate5 / conjugate5<- Extract / overwrite conjugate5 slot.
conjugate3 / conjugate3<- Extract / overwrite conjugate3 slot.
secondary_structure / secondary_structure<- Extract / overwrite secondary_structure slot.
duplex_structure / duplex_structure<- Extract / overwrite duplex_structure slot.
dictionary / dictionary<- Extract / overwrite dictionary slot.
compl_dictionary / compl_dictionary<- Extract / overwrite compl_dictionary slot.
XNAStringFromHelm Change helm notation to XNAString/XNAStringSet object.
XNAPairwiseAlignment Inherited from Biostrings package. Solve global/local/ends-free alignment problems.
XNAMatchPattern Inherited from Biostrings package. Find/count all the occurrences of a given pattern (typically short) in a reference sequence (typically long). Support mismatches and indels.
XNAVmatchPattern Inherited from Biostrings package. Find/count all the occurrences of a given pattern (typically short) in a set of reference sequences. Support mismatches and indels.
XNAMatchPDict Inherited from Biostrings package. Find/count all the occurrences of a set of patterns in a reference sequence. Support a small number of mismatches.
XNAAlphabetFrequency Tabulate the letters and count frequency for nucleotides.
XNADinucleotideFrequency Tabulate the letters and count frequency for dinucleotides.
mimir2XnaDict Reformat mimir table to XNA dictionary standards

2.4 Other functions

XNAString class methods
function. description
concatDict Concatanate custom HELM-symbol dictionary with built-in HELM-symbol xna_dictionary.
mimir2XnaDict Rewrite dictionary table to standard format.

3 XNAString class

XNAString class is subclass of Biostrings::BString and has 13 slots:

  • name (character)

  • base (character, RNAString, RNAStringSet, DNAString or DNAStringSet)

  • sugar (character)

  • backbone (character, if missing and base is character default string is created by replicating character ‘X’, if missing and base is DNAString/RNAString, backbone all ‘O’)

  • target (DNAStringSet, DNAString or character)

  • conjugate5 (character)

  • conjugate3 (character)

  • secondary_structure (list - structure is character and mfe numeric)

  • duplex_structure (list - structure is character and mfe numeric)

  • dictionary (xna_dictionary default, data.table type)

  • compl_dictionary (complementary_bases default, data.table type)

  • default_sugar (character)

  • default_backbone (character)

Target, name and conjugate slots can be NA. If backbone or dictionaries missing, default values in use.

Validation procedure requirements for XNAString objects:

  • slots type must reflect above requirements

  • length of sugar equals base

  • length of backbone is one element shorter than sugar and base

  • condition on available letters in base / sugar / backbone dictionary must be satisfied

  • length of base, sugar and backbone vectors is the same and is equal 1 or 2, length of target vector >0, length of name and conjugates vectors equal 1

  • length of default_sugar and default_backbone is 1, nchar is also 1 if not NA

Object can be created only when all validation procedure requirements are met.

3.1 Object creation

3.1.1 Example 1 - basic XNAString object

If base slot is passed as character and sugar slot passed as well, default backbone is a replication of character ‘X’. Target / secondary_structure / duplex_structure slot default is created by XNAReverseComplement / predictMfeStructure / predictDuplexStructure method applied on base slot.

obj <- XNAString(base = 'ATCG', sugar = 'OOOO')
obj
## XNAString object
## name:       NA
## base:       ATCG
## sugar:      OOOO
## backbone:   XXX
## target:     CGAT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0

3.1.2 Example 2 - XNAString object with optional slots

obj <- XNAString(base = 'ATCG', 
                 sugar = 'OOOO', 
                 backbone = 'SOS', 
                 target = Biostrings::DNAStringSet('TAGC'), 
                 conjugate3 = "[5gn2c6]", 
                 name = "oligo1")
obj
## XNAString object
## name:       oligo1
## base:       ATCG
## sugar:      OOOO
## backbone:   SOS
## target:     TAGC
## conjugate5: NA
## conjugate3: [5gn2c6]
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0

3.1.3 Example 3 - XNAString object with multiple targets

obj <- XNAString(base = 'ATCG', 
                 sugar = 'OOOO', 
                 backbone = 'SOS', 
                 target = Biostrings::DNAStringSet(c('TAGC', 'TATC')))
obj
## XNAString object
## name:       NA
## base:       ATCG
## sugar:      OOOO
## backbone:   SOS
## target:     TAGC, TATC
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0

3.1.4 Example 4 - XNAString for siRNA

obj <- XNAString(base = c('ATCG', 'TAGC'),
                 sugar = c('OOOO', 'OOOO'),
                 backbone = c('SOS','SOS'),
                 target = Biostrings::DNAStringSet(c('TAGC', 'TATC')))
obj
## XNAString object
## name:       NA
## base:       ATCG, TAGC
## sugar:      OOOO, OOOO
## backbone:   SOS, SOS
## target:     TAGC, TATC
## conjugate5: NA
## conjugate3: NA
## secondary_structure: 
## duplex_structure: ....&...., 0

3.1.5 Example 5 - DNAString/ RNAString as base slot

XNAString with DNAString base should create default sugar all D, and backbone all O.
XNAString with RNAString base should create default sugar all R, and backbone all O.
d1 <- Biostrings::DNAString(x = "ACGATCG")

obj <- XNAString(base = d1)
obj
## XNAString object
## name:       NA
## base:       ACGATCG
## sugar:      DDDDDDD
## backbone:   OOOOOO
## target:     CGATCGT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ......., 0
## duplex_structure: .((((((&.)))))), -7.6

3.1.6 Example 6 - XNAString with optional default_backbone and default_sugar slots

XNAString(base = 'ATCG')
## XNAString object
## name:       NA
## base:       ATCG
## sugar:      DDDD
## backbone:   OOO
## target:     CGAT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0
XNAString(base = 'ATCG', default_sugar = 'F', default_backbone = 'X')
## XNAString object
## name:       NA
## base:       ATCG
## sugar:      FFFF
## backbone:   XXX
## target:     CGAT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0
XNAString(base = Biostrings::DNAString('ATCG'))
## XNAString object
## name:       NA
## base:       ATCG
## sugar:      DDDD
## backbone:   OOO
## target:     CGAT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0
XNAString(base = Biostrings::DNAString('ATCG'),  default_backbone = 'O', default_sugar = 'R')
## XNAString object
## name:       NA
## base:       ATCG
## sugar:      RRRR
## backbone:   OOO
## target:     CGAT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0

3.2 Methods

3.2.1 Example 1 - public slots getter and setter

Public slots setter - once an object is created, the setter enables users to modify all slots.
New input has to satisfy validation procedures (e.g. the sugar must be of the same length as base)
obj <- XNAString(base = 'ATCG', sugar = 'FODL', backbone = 'SOO')
base(obj)
## [1] "ATCG"
base(obj) <- 'CTGA'
base(obj)
## [1] "CTGA"

3.2.2 Example 2 - XNAReverseComplement (GU wobbles)

XNAReverseComplement method takes XNAString object as an input and finds reverse complement for base slot.
This method is used to create default target slot. Each object has compl_dictionary slot (if not passed while creating an object - complementary_bases table used as default). All bases from dictionary must be present also in compl_dictionary. If that is not the case, target slot default is empty. It is also possible to return multiple targets with XNAReverseComlement method. It can be done by passing custom complementary_bases dictionary and using Iupac symbols. E.g. by adding symbol ‘R’ to custom complementary_bases dictionary, there are two possibile bases: ‘G’ and ‘A’ (see the example below). More symbols and corresponding bases (Iupac standards): * symbol “W” - bases “A”, “T”

  • symbol “S” - bases “G”, “C”

  • symbol “M” - bases “A”, “C”

  • symbol “K” - bases “G”, “T”

  • symbol “R” - bases “A”, “G”

  • symbol “Y” - bases “C”, “T”

  • symbol “B” - bases “C”, “G”, “T”

  • symbol “D” - bases “A”, “G”, “T”

  • symbol “H” - bases “A”, “C”, “T”

  • symbol “V” - bases “A”, “C”, “G”

  • symbol “N” - bases “A”, “C”, “G”, “T”

XNAString::complementary_bases
##    base target compl_target
## 1:    A      T            A
## 2:    C      G            C
## 3:    G      C            G
## 4:    T      A            T
## 5:    E      G            C
## 6:    U      A            T
obj1 <- XNAString(base = "ACEGTTGGT",
                    sugar = 'FODDDDDDD',
                    conjugate3 = 'TAG')
XNAString::XNAReverseComplement(obj1)
## [1] "ACCAACGGT"
custom_compl_dict <-
  rbind(
    XNAString::complementary_bases[seq(1, 5),],
    data.table::data.table(
      base = 'U',
      target = 'R',
      compl_target = 'T'
    )
  )
obj2 <- XNAString(base = "ACGCUUA",
                  sugar = 'DDDDDDD',
                  compl_dictionary = custom_compl_dict)
obj2
## XNAString object
## name:       NA
## base:       ACGCUUA
## sugar:      DDDDDDD
## backbone:   XXXXXX
## target:     TAAGCGT, TGAGCGT, TAGGCGT, TGGGCGT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ......., 0
## duplex_structure: ..((...&..))..., -2.1
#create custom complementary dictonary with complementary bases coded as IUPAC 
compl_dict <- XNAString::complementary_bases
compl_dict[base == "G"]$target <- "Y"
compl_dict[base == "T"]$target <- "R" # if you have T in your base sequence 
compl_dict[base == "U"]$target <- "R" # if you have U in your base sequence 

compl_dict
##    base target compl_target
## 1:    A      T            A
## 2:    C      G            C
## 3:    G      Y            G
## 4:    T      R            T
## 5:    E      G            C
## 6:    U      R            T
xna <- XNAString::XNAString(base = "ACGTACGT", sugar = "DDDDDDDD",compl_dictionary = compl_dict)
xna
## XNAString object
## name:       NA
## base:       ACGTACGT
## sugar:      DDDDDDDD
## backbone:   XXXXXXX
## target:     ACGTACGT, GCGTACGT, ATGTACGT, GTGTACGT, ACGTGCGT, GCGTGCGT, ATGTGCGT, GTGTGCGT, ACGTATGT, GCGTATGT, ATGTATGT, GTGTATGT, ACGTGTGT, GCGTGTGT, ATGTGTGT, GTGTGTGT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ........, 0
## duplex_structure: ((((((((&)))))))), -9.8

3.2.3 Example 3 - predictMfeStructure

predictMfeStructure - this method uses RNAfold_MFE function from C library ViennaRNA, this function can handle only standard bases. Takes XNAString object and applies RNAfold_MFE function on base slot if base is DNAString or RNAString. If base is character (not-standard bases allowed), then RNAfold_mfe works for A, C, G, T, E, U letters with built-in complementary_bases dictionary (e.g. E letter is changed to C before applying predictMfeStructure function). If base slot includes other letters, their translation to compl_target must be included in compl_dict slot. Otherwise predictMfeStructure returns empty string. predictMfeStructure method is used to create default secondary_structure slot.

obj <- XNAString(base = 'GAGAGGGAACCAGGCAGGGACCGCAGACAACA', 
                   sugar = 'FODLMFODLMFODLMFODLMFODLMFFFFFFF')
XNAString::predictMfeStructure(obj)
## $structure
## [1] ".....((..((.....))..)).........."
## 
## $mfe
## [1] -4.5

3.2.4 Example 4 - predictDuplexStructure - double stranded molecule

predictDuplexStructure - this method uses RNAcofold_MFE function from C library ViennaRNA, this function can handle only standard bases. predictDuplexStructure takes XNAString object and applies RNAcofold_MFE function on base slot if base is DNAStringSet or RNAStringSet. If base is character vector (not-standard bases allowed), then RNAcofold_mfe works for A, C, G, T, E, U letters with built-in complementary_bases dictionary (e.g. E letter is changed to C before applying predictDuplexStructure function). If base slot includes other letters, their translation to compl_target must be included in compl_dict slot passed manually. Otherwise predictDuplexStructure returns empty string. predictDuplexStructure method is used to create default duplex_structure slot.

obj <- XNAString(base = Biostrings::DNAStringSet(c('GAGAGGGAACCAGGCAGGGACCGCAGACAACA', 'GAGAGGGAACCAGGCAGGGACCGCAGACAACA')))

XNAString::predictDuplexStructure(obj)
## $structure
## [1] ".....((..((..((.((..((..........&.....))..))..)).))..)).........."
## 
## $mfe
## [1] -14.5

3.2.5 Example 5 - predictDuplexStructure - single stranded molecule

RNAcofold_MFE function needs two sequences, so if molecule is sinlge stranded, base sequence is duplicated within the predictDuplexStructure method.

obj <- XNAString(base = Biostrings::DNAString('GAGAGGGAACCAGGCAGGGACCGCAGACAACA'), 
                 sugar = 'FODLMFODLMFODLMFODLMFODLMFFFFFFF')
  
XNAString::predictDuplexStructure(obj)
## $structure
## [1] ".....((..((..((.((..((..........&.....))..))..)).))..)).........."
## 
## $mfe
## [1] -14.5

4 XNAStringSet class

XNAStringSet class consists of XNAString objects given as a list. Validation procedure checks if all objects are of XNAString class.

4.1 Object creation

4.1.1 Example 1 - basic XNAStringSet object as a list of XNAString objects

XNAString_obj1 <- XNAString(base = 'ATCG', sugar = 'FODD')
XNAString_obj2 <- XNAString(base = 'TTCT', sugar = 'FOLL')

XNAStringSet_obj <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))

XNAStringSet_obj
## XNAStringSet object
##    name base sugar backbone target conjugate5 conjugate3 secondary_structure
## 1:   NA ATCG  FODD      XXX   CGAT         NA         NA              ....,0
## 2:   NA TTCT  FOLL      XXX   AGAA         NA         NA              ....,0

4.1.2 Example 2 - XNAStringSet consists of XNAString objects for siRNA

XNAString_obj1 <- XNAString(base = c('ATCG', 'TAGC'), 
                 sugar = c('OOOO', 'OOOO'), 
                 backbone = c('SOS','SOS'), 
                 target = Biostrings::DNAStringSet(c('TAGC', 'TACC')))

XNAString_obj2 <- XNAString(base = c('GGCG', 'TATC'), 
                 sugar = c('OOOO', 'OOOO'), 
                 target = Biostrings::DNAStringSet(c('CCGC', 'TATG')))

XNAStringSet_siRNA <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))

XNAStringSet_siRNA
## XNAStringSet object
##    name      base     sugar backbone    target conjugate5 conjugate3
## 1:   NA ATCG,TAGC OOOO,OOOO  SOS,SOS TAGC,TACC         NA         NA
## 2:   NA GGCG,TATC OOOO,OOOO  XXX,XXX CCGC,TATG         NA         NA
##    secondary_structure
## 1:                    
## 2:

4.1.3 Example 3 - XNAStringSet object created by passing vectors

XNAStringSet(base = c("ATGCT","TGCAT","ATATG"))
## XNAStringSet object
##    name  base sugar backbone target conjugate5 conjugate3 secondary_structure
## 1:   NA ATGCT DDDDD     OOOO  AGCAT         NA         NA             .....,0
## 2:   NA TGCAT DDDDD     OOOO  ATGCA         NA         NA             .....,0
## 3:   NA ATATG DDDDD     OOOO  CATAT         NA         NA             .....,0
XNAStringSet(base = c("ATGCT","TGCAT","ATATG"), default_sugar = 'R', default_backbone = 'X')
## XNAStringSet object
##    name  base sugar backbone target conjugate5 conjugate3 secondary_structure
## 1:   NA ATGCT RRRRR     XXXX  AGCAT         NA         NA             .....,0
## 2:   NA TGCAT RRRRR     XXXX  ATGCA         NA         NA             .....,0
## 3:   NA ATATG RRRRR     XXXX  CATAT         NA         NA             .....,0
XNAStringSet(base = c("ATGCT","TGCAT","ATATG"),sugar = c("DDDDD","DDDDD","DDDDD"))
## XNAStringSet object
##    name  base sugar backbone target conjugate5 conjugate3 secondary_structure
## 1:   NA ATGCT DDDDD     XXXX  AGCAT         NA         NA             .....,0
## 2:   NA TGCAT DDDDD     XXXX  ATGCA         NA         NA             .....,0
## 3:   NA ATATG DDDDD     XXXX  CATAT         NA         NA             .....,0
XNAStringSet(base= list(c('TT', 'GG'), 
                        c('TG', 'GT'), 
                        c('TG')), 
             sugar = list(c('FF', 'FO'), 
                          c('OO', 'OF'), 
                          c('OO')), 
             backbone =list(c('X', 'X'), 
                            c('X', 'X'), 
                            c('X')))
## XNAStringSet object
##    name  base sugar backbone target conjugate5 conjugate3 secondary_structure
## 1:   NA TT,GG FF,FO      X,X     AA         NA         NA                    
## 2:   NA TG,GT OO,OF      X,X     CA         NA         NA                    
## 3:   NA    TG    OO        X     CA         NA         NA                ..,0

4.1.4 Example 4 - XNAStringSet object created by passing vectors and optional coml_dict (GU wobbles)

XNAStringSet(base= c('TT', 'GG'), sugar = c('FF', 'FF'))
## XNAStringSet object
##    name base sugar backbone target conjugate5 conjugate3 secondary_structure
## 1:   NA   TT    FF        X     AA         NA         NA                ..,0
## 2:   NA   GG    FF        X     CC         NA         NA                ..,0
compl_dict <- XNAString::complementary_bases
compl_dict[base == "G"]$target <- "Y"
compl_dict[base == "T"]$target <- "R" # if you have T in your base sequence
compl_dict[base == "U"]$target <- "R" # if you have U in your base sequence

XNAStringSet(base= c('TT', 'GG'), sugar = c('FF', 'FF'), compl_dict = compl_dict) 
## XNAStringSet object
##    name base sugar backbone      target conjugate5 conjugate3
## 1:   NA   TT    FF        X AA,GA,AG,GG         NA         NA
## 2:   NA   GG    FF        X CC,TC,CT,TT         NA         NA
##    secondary_structure
## 1:                ..,0
## 2:                ..,0
# compl_dict in use only if target empty
XNAStringSet(base= c('TT', 'GG'), 
             sugar = c('FF', 'FF'), 
             target = c('AA', 'CC'), 
             compl_dict = compl_dict)
## XNAStringSet object
##    name base sugar backbone target conjugate5 conjugate3 secondary_structure
## 1:   NA   TT    FF        X     AA         NA         NA                ..,0
## 2:   NA   GG    FF        X     CC         NA         NA                ..,0

4.1.5 Example 5 - XNAStringSet created from data.table

dt <- data.table::data.table(base = c('TT', 'GG'))
out1 <- dt2Set(dt)
out1
## XNAStringSet object
##    name base sugar backbone target conjugate5 conjugate3 secondary_structure
## 1:   NA   TT    DD        O     AA         NA         NA                ..,0
## 2:   NA   GG    DD        O     CC         NA         NA                ..,0
dt <- data.table::data.table(base = c('TT', 'GG'), default_sugar = 'R', default_backbone = 'X')
out2 <- dt2Set(dt)
out2
## XNAStringSet object
##    name base sugar backbone target conjugate5 conjugate3 secondary_structure
## 1:   NA   TT    DD        O     AA         NA         NA                ..,0
## 2:   NA   GG    DD        O     CC         NA         NA                ..,0
dt <- data.table::data.table(base= list(c('TT', 'GG'), 
                                        c('TG', 'GT'), 
                                        c('TG')), 
                            sugar = list(c('FF', 'FO'), 
                                         c('OO', 'OF'), 
                                         c('OO')), 
                            backbone =list(c('X', 'X'), 
                                           c('X', 'X'), 
                                           c('X')))
out3 <- dt2Set(dt)
out3
## XNAStringSet object
##    name  base sugar backbone target conjugate5 conjugate3 secondary_structure
## 1:   NA TT,GG FF,FO      X,X     AA         NA         NA                    
## 2:   NA TG,GT OO,OF      X,X     CA         NA         NA                    
## 3:   NA    TG    OO        X     CA         NA         NA                ..,0

4.2 Methods

Methods which enable XNAStringSet extraction / modification:

  • set2List

  • extraction methods (“[” and “[[”)

  • set2Dt method

  • public slots setter/getter

  • XNAStringSet from data.table/data.frame

4.2.1 Example 1 - change XNAStringSet object to a list of XNAString objects

XNAString_obj1 <- XNAString(name = 'oligo1', base = 'ATCG', sugar = 'FODD')
XNAString_obj2 <- XNAString(name = 'oligo2',base = 'TTCT', sugar = 'FOLL')

XNAStringSet_obj <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))
set2List(XNAStringSet_obj)
## [[1]]
## XNAStringSet object
##      name base sugar backbone target conjugate5 conjugate3 secondary_structure
## 1: oligo1 ATCG  FODD      XXX   CGAT         NA         NA              ....,0
## 
## [[2]]
## XNAStringSet object
##      name base sugar backbone target conjugate5 conjugate3 secondary_structure
## 1: oligo2 TTCT  FOLL      XXX   AGAA         NA         NA              ....,0

4.2.2 Example 2 - extract single XNAString object or part of XNAStringSet object

XNAStringSet_obj[2]
## XNAStringSet object
##      name base sugar backbone target conjugate5 conjugate3 secondary_structure
## 1: oligo2 TTCT  FOLL      XXX   AGAA         NA         NA              ....,0
XNAStringSet_obj['oligo2']
## XNAStringSet object
##      name base sugar backbone target conjugate5 conjugate3 secondary_structure
## 1: oligo2 TTCT  FOLL      XXX   AGAA         NA         NA              ....,0
XNAStringSet_obj[c(1,2)]
## XNAStringSet object
##      name base sugar backbone target conjugate5 conjugate3 secondary_structure
## 1: oligo1 ATCG  FODD      XXX   CGAT         NA         NA              ....,0
## 2: oligo2 TTCT  FOLL      XXX   AGAA         NA         NA              ....,0
XNAStringSet_obj[c('oligo1', 'oligo2')]
## XNAStringSet object
##      name base sugar backbone target conjugate5 conjugate3 secondary_structure
## 1: oligo1 ATCG  FODD      XXX   CGAT         NA         NA              ....,0
## 2: oligo2 TTCT  FOLL      XXX   AGAA         NA         NA              ....,0
XNAStringSet_obj[[2]]
## XNAString object
## name:       oligo2
## base:       TTCT
## sugar:      FOLL
## backbone:   XXX
## target:     AGAA
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0
XNAStringSet_obj[['oligo2']]
## XNAString object
## name:       oligo2
## base:       TTCT
## sugar:      FOLL
## backbone:   XXX
## target:     AGAA
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0

4.2.3 Example 3 - change XNAStringSet object to data.table

set2Dt(XNAStringSet_obj, slots =c('name', 'base', 'sugar', 'backbone', 'target', 'conjugate5', 'conjugate3'))
##      name base sugar backbone target conjugate5 conjugate3
## 1: oligo1 ATCG  FODD      XXX   CGAT         NA         NA
## 2: oligo2 TTCT  FOLL      XXX   AGAA         NA         NA

4.2.4 Example 4 - public slots getter and setter

base(XNAStringSet_siRNA, 1)
## [1] "ATCG" "GGCG"
base(XNAStringSet_siRNA, 2)
## [1] "TAGC" "TATC"
base(XNAStringSet_siRNA, 1) <- c('CTGA', 'CCCC')
base(XNAStringSet_siRNA, 2) <- c('TTTT', 'TTTT')
XNAStringSet_siRNA
## XNAStringSet object
##    name      base     sugar backbone    target conjugate5 conjugate3
## 1:   NA CTGA,TTTT OOOO,OOOO  SOS,SOS TAGC,TACC         NA         NA
## 2:   NA CCCC,TTTT OOOO,OOOO  XXX,XXX CCGC,TATG         NA         NA
##    secondary_structure
## 1:                    
## 2:

4.2.5 Example 5 - XNAStringSet from data.table

dt <- data.table::data.table(base= c('AEACACACACACAEAE', 'AAETGCTETGTATTTTTE'), 
                  sugar = c('LLLDDDDDDDDDDLLL', 'LLLDDDDDDDDDDDLLLL'), 
                  backbone =c('SSSSSSSSSSSSSSS', 'SSSSSSSSSSSSSSSSS'))
dt
##                  base              sugar          backbone
## 1:   AEACACACACACAEAE   LLLDDDDDDDDDDLLL   SSSSSSSSSSSSSSS
## 2: AAETGCTETGTATTTTTE LLLDDDDDDDDDDDLLLL SSSSSSSSSSSSSSSSS
dt2Set(dt)
## XNAStringSet object
##    name               base              sugar          backbone
## 1:   NA   AEACACACACACAEAE   LLLDDDDDDDDDDLLL   SSSSSSSSSSSSSSS
## 2:   NA AAETGCTETGTATTTTTE LLLDDDDDDDDDDDLLLL SSSSSSSSSSSSSSSSS
##                target conjugate5 conjugate3  secondary_structure
## 1:   GTGTGTGTGTGTGTGT         NA         NA   ................,0
## 2: GAAAAATACAGAGCAGTT         NA         NA ..................,0

5 HELM notation to XNAString object

Methods written in this section translate HELM notation to multistring notation and create XNAString/XNAStringSet object.

5.0.1 Example 1 - ssRNA

helm <-
    "CHEM1{[5gn2c6]}|RNA1{P.[dR](C)P.[dR](A)P.[LR](G)[sP].[LR](A)[sP].[LR](G)[sP].[LR](A)[sP].[dR](A)[sP].[dR](G)[sP].[dR](G)[sP].[dR](C)[sP].[dR](A)[sP].[dR](C)[sP].[dR](A)[sP].[dR](G)[sP].[dR](A)[sP].[LR]([5meC])[sP].[LR](G)[sP].[LR](G)}$CHEM1,RNA1,1:R2-1:R1$$$V2.0"

XNAStringFromHelm(helm, name ='oligo1')
## XNAString object
## name:       oligo1
## base:       GAGAAGGCACAGAEGG
## sugar:      LLLLDDDDDDDDDLLL
## backbone:   XXXXXXXXXXXXXXX
## target:     CCGTCTGTGCCTTCTC
## conjugate5: [5gn2c6]
## conjugate3: 
## secondary_structure: ................, 0
## duplex_structure: ......((........&......))........, -2.1

5.0.2 Example 2 - siRNA

helm <-
"RNA1{[mR](C)P.[mR](C)P.[fR](C)P.[mR](C)P.[fR](C)P.[mR](G)P.[fR](C)P.[mR](C)P.[fR](G)P.[mR](T)P.[fR](G)P.[mR](G)P.[fR](T)P.[mR](T)P.[fR](C)P.[mR](A)P.[fR](T)P.[mR](A)P.[fR](A)}|RNA2{[fR](T)P.[fR](T)P.[mR](A)P.[fR](T)P.[mR](G)P.[fR](A)P.[mR](A)P.[fR](C)P.[mR](C)P.[fR](A)P.[mR](C)P.[fR](G)P.[mR](G)P.[fR](C)P.[mR](A)P.[fR](G)P.[mR](G)P.[fR](G)P.[mR](G)P.[fR](C)P.[mR](G)}$RNA1,RNA2,2:pair-56:pair|RNA1,RNA2,5:pair-53:pair|RNA1,RNA2,8:pair-50:pair|RNA1,RNA2,11:pair-47:pair|RNA1,RNA2,14:pair-44:pair|RNA1,RNA2,17:pair-41:pair|RNA1,RNA2,20:pair-38:pair|RNA1,RNA2,23:pair-35:pair|RNA1,RNA2,26:pair-32:pair|RNA1,RNA2,29:pair-29:pair|RNA1,RNA2,32:pair-26:pair|RNA1,RNA2,35:pair-23:pair|RNA1,RNA2,38:pair-20:pair|RNA1,RNA2,41:pair-17:pair|RNA1,RNA2,44:pair-14:pair|RNA1,RNA2,47:pair-11:pair|RNA1,RNA2,50:pair-8:pair|RNA1,RNA2,53:pair-5:pair|RNA1,RNA2,56:pair-2:pair$$$V2.0"


XNAStringFromHelm(helm)
## XNAString object
## name:       NA
## base:       CCCCCGCCGTGGTTCATAA, TTATGAACCACGGCAGGGGCG
## sugar:      OOFOFOFOFOFOFOFOFOF, FFOFOFOFOFOFOFOFOFOFO
## backbone:   OOOOOOOOOOOOOOOOOO, OOOOOOOOOOOOOOOOOOOO
## target:     TTATGAACCACGGCGGGGG
## conjugate5: 
## conjugate3: 
## secondary_structure: 
## duplex_structure: ((((.(((((((((((((.&.))))))))))))).)))).., -32.900002

5.0.3 Example 3 - removed linker

helm <-
"CHEM1{[5gn2c6]}|RNA1{P.[dR](C)P.[dR](A)P.[LR](T)[sP].[LR](T)[sP].[LR](G)[sP].[dR](A)[sP].[dR](A)[sP].[dR](T)[sP].[dR](A)[sP].[dR](A)[sP].[dR](G)[sP].[dR](T)[sP].[dR](G)[sP].[dR](G)[sP].[dR](A)[sP].[LR](T)[sP].[LR](G)[sP].[LR](T)}$CHEM1,RNA1,1:R2-1:R1$$$V2.0"

XNAStringFromHelm(helm)
## XNAString object
## name:       NA
## base:       TTGAATAAGTGGATGT
## sugar:      LLLDDDDDDDDDDLLL
## backbone:   XXXXXXXXXXXXXXX
## target:     ACATCCACTTATTCAA
## conjugate5: [5gn2c6]
## conjugate3: 
## secondary_structure: ................, 0
## duplex_structure: ................&................, 0

5.0.4 Example 4 - create XNAStringSet from list of HELM strings

helm <- c("CHEM1{[5gn2c6]}|RNA1{P.[dR](C)P.[dR](A)P.[LR](G)[sP].[LR](A)[sP].[LR](G)[sP].[LR](A)[sP].[dR](A)[sP].[dR](G)[sP].[dR](G)[sP].[dR](C)[sP].[dR](A)[sP].[dR](C)[sP].[dR](A)[sP].[dR](G)[sP].[dR](A)[sP].[LR]([5meC])[sP].[LR](G)[sP].[LR](G)}$CHEM1,RNA1,1:R2-1:R1$$$V2.0",     
          "RNA1{[LR](T)[sP].[LR](G)[sP].[dR](T)[sP].[dR](G)[sP].[LR](T)[sP].[LR](G)[sP].[dR](T)[sP].[dR](G)[sP].[LR](T)[sP].[LR](G)[sP].[dR](T)[sP].[dR](G)[sP].[LR](T)[sP].[LR](G)[sP].[LR](T)}$$$$V2.0")

XNAStringFromHelm(helm, name =c('oligo1', 'oligo2'))
## XNAStringSet object
##      name             base            sugar        backbone           target
## 1: oligo1 GAGAAGGCACAGAEGG LLLLDDDDDDDDDLLL XXXXXXXXXXXXXXX CCGTCTGTGCCTTCTC
## 2: oligo2  TGTGTGTGTGTGTGT  LLDDLLDDLLDDLLL  XXXXXXXXXXXXXX  ACACACACACACACA
##    conjugate5 conjugate3 secondary_structure
## 1:   [5gn2c6]             ................,0
## 2:                         ...............,0

6 XNAString object to HELM notation

Methods written in this section translate multistring notation from XNAString/XNAStringSet object to HELM notation. If molecule is double stranded also pairing information is added.

6.0.1 Example 1 - single stranded molecule

obj <- XNAString(base = 'GAGTTACTTGCCAAET',
                 sugar = 'LLLDMDDDDDDDDLLL',
                 backbone = 'XXXXXXXXXXXXXX2')
  
XNAStringToHelm(obj)
## [1] "RNA1{[LR](G)[sP].[LR](A)[sP].[LR](G)[sP].[dR](T)[sP].[MOE](T)[sP].[dR](A)[sP].[dR](C)[sP].[dR](T)[sP].[dR](T)[sP].[dR](G)[sP].[dR](C)[sP].[dR](C)[sP].[dR](A)[sP].[LR](A)[sP].[LR]([5meC])[PS2].[LR](T)}$$$$V2.0"

6.0.2 Example 2 - double stranded molecule, base is DNAString

obj <- XNAString(
    base = Biostrings::DNAStringSet(c("CCCC", "GGGG")),
    sugar = c("OOFO", "FFOF"),
    backbone = c("OOO", "OOO"),
    target = '',
    conjugate3 = "",
    conjugate5 = "")
  
XNAStringToHelm(obj)
## [1] "RNA1{[mR](C)P.[mR](C)P.[fR](C)P.[mR](C)}|RNA2{[fR](G)P.[fR](G)P.[mR](G)P.[fR](G)}$RNA1,RNA2,2:pair-11:pair|RNA1,RNA2,5:pair-8:pair|RNA1,RNA2,8:pair-5:pair|RNA1,RNA2,11:pair-2:pair$$$$V2.0"

6.0.3 Example 3 - double stranded molecule - pairing information added

obj <- XNAString(
    base = c("CCCCEGC", "UUAUGAT"),
    sugar = c("OOFOFOF", "FFOFOFO"),
    backbone = c("OOOOOO", "OOOOOO"),
    target = '',
    conjugate3 = "[5gn2c6]",
    conjugate5 = "")
  
XNAStringToHelm(obj)
## [1] "RNA1{[mR](C)P.[mR](C)P.[fR](C)P.[mR](C)P.[fR]([5meC])P.[mR](G)P.[fR](C)}|CHEM1{[5gn2c6]}|RNA2{[fR](U)P.[fR](U)P.[mR](A)P.[fR](U)P.[mR](G)P.[fR](A)P.[mR](T)}$RNA1,RNA2,2:pair-20:pair|RNA1,RNA2,5:pair-17:pair|RNA1,RNA2,8:pair-14:pair|RNA1,RNA2,11:pair-11:pair|RNA1,RNA2,14:pair-8:pair|RNA1,RNA2,17:pair-5:pair|RNA1,RNA2,20:pair-2:pair$$$$V2.0"

7 Alignment and matching functions

Methods in this section are inherited from Biostrings package. All options in Biostrings methods:

  • pairwiseAlignment

  • matchPattern

  • vmatchPattern

  • matchPDict

are available in XNAStrng package as well. XNAString package renamed these mathods by adding XNA prefix in the beginning.

7.1 XNAPairwiseAlignment for target sequence

Target sequence is used as pattern.

7.1.1 Example: global alignment

obj <- XNAString(base = 'ATCGATATATATACACATGTATGATG',
                 sugar = 'OOOODDDDDDDDDDDDDDDDDDDDDD',
                 target = DNAStringSet(c('TAGCTATATATATGTGTACATACTAC', 'TAGCTAGATATATGTGTACATACTAC')))

subject <- 'ATCGATATATATACACATGTATGATGTAGCTATATATATGTGTACATACTACATCGATATATATACACATGTATGATG'
substitutionMatrix <- Biostrings::nucleotideSubstitutionMatrix()

XNAString::XNAPairwiseAlignment(pattern = obj, subject = subject, substitutionMatrix = substitutionMatrix)
## Global PairwiseAlignmentsSingleSubject (1 of 2)
## pattern: --------------------------TAGCTATA...CATACTAC--------------------------
## subject: ATCGATATATATACACATGTATGATGTAGCTATA...CATACTACATCGATATATATACACATGTATGATG
## score: -202

7.1.2 Example: local alignment

substitutionMatrix <- Biostrings::nucleotideSubstitutionMatrix()

XNAString::XNAPairwiseAlignment(pattern = obj, subject = subject, type = "local", substitutionMatrix = substitutionMatrix)
## Local PairwiseAlignmentsSingleSubject (1 of 2)
## pattern:  [1] TAGCTATATATATGTGTACATACTAC
## subject: [27] TAGCTATATATATGTGTACATACTAC
## score: 26

7.2 XNAMatchPattern for target sequence

Target sequence is used as pattern. If more then one target is present in XNAString object, first is used as default. User can specify which target sequence should be taken as pattern (target.number parameter).

7.2.1 Example: default matching of first target

XNAString::XNAMatchPattern(pattern = obj, subject = subject)
## Views on a 78-letter BString subject
## subject: ATCGATATATATACACATGTATGATGTAGCTATA...CATACTACATCGATATATATACACATGTATGATG
## views:
##       start end width
##   [1]    27  52    26 [TAGCTATATATATGTGTACATACTAC]

7.2.2 Example: match target selected by user

XNAString::XNAMatchPattern(pattern = obj, subject = subject, target.number = 2)
## Views on a 78-letter BString subject
## subject: ATCGATATATATACACATGTATGATGTAGCTATA...CATACTACATCGATATATATACACATGTATGATG
## views: NONE

7.2.3 Example: match pattern selected by user with 1 mismatch

XNAString::XNAMatchPattern(pattern = obj, subject = subject, target.number = 2, max.mismatch = 1)
## Views on a 78-letter BString subject
## subject: ATCGATATATATACACATGTATGATGTAGCTATA...CATACTACATCGATATATATACACATGTATGATG
## views:
##       start end width
##   [1]    27  52    26 [TAGCTATATATATGTGTACATACTAC]

7.3 XNAVmatchPattern for multiple subjects

subject <- c('ATCGATATATATACACATGTATGATGTAGCTATATATATGTGTACATACTACATCGATATATATACACATGTATGATG', 'ATCGATATATATACACATGTATGATGTAGCTATATATATGTGTGCGACTACATCGATATATATACACATGTATGATG')

XNAString::XNAVmatchPattern(pattern = obj, subject)
## MIndex object of length 2
## [[1]]
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]        27        52        26
## 
## [[2]]
## IRanges object with 0 ranges and 0 metadata columns:
##        start       end     width
##    <integer> <integer> <integer>

7.4 XNAMatchPDict for multiple targets

Only one subject is allowed. Results created for all targets.

subject <- 'ATCGATATATATACACATGTATGATGTAGCTATATATATGTGTACATACTACATCGATATATATACACATGTATGATG'

XNAString::XNAMatchPDict(pdict = obj, subject = subject)
## MIndex object of length 2
## [[1]]
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]        27        52        26
## 
## [[2]]
## IRanges object with 0 ranges and 0 metadata columns:
##        start       end     width
##    <integer> <integer> <integer>

8 AlphabetFrequency for nucleotides

Letters are tabulated and occurence frequency calculated for nucleotides. There are 6 arguments: * obj (either XNAString and XNAStringSet object) * slot (‘base’, ‘sugar’ or ‘backbone’) * letters (frequency checked just for these letters. If empty, letters from object’s dictionary taken as the default. ) * matrix_nbr (1 is the default. If 1 - first slot’s element is use, if 2 - 2nd element in slot) * as.prob (default FALSE) * base_only (default FALSE. If TRUE, ‘A’, ‘C’, ‘G’, ‘T’, ‘other’ are tabulated)

8.0.1 Example 1: XNAString object

xnastring_obj <- XNAString(name = 'oligo1',
                           base = c('AEEE'),
                           sugar = c('FFOO'),
                           target = DNAStringSet('TTT'))

XNAString::XNAAlphabetFrequency(obj = xnastring_obj, slot = 'base')
##      A    C    E    G    T    U   
## [1,] 1.00 0.00 3.00 0.00 0.00 0.00
XNAString::XNAAlphabetFrequency(obj = xnastring_obj, slot = 'base', as.prob = TRUE)
##      A    C    E    G    T    U   
## [1,] 0.25 0.00 0.75 0.00 0.00 0.00
XNAString::XNAAlphabetFrequency(obj = xnastring_obj, slot = 'base', base_only = TRUE)
##     A     C     G     T other 
##  1.00  0.00  0.00  0.00  3.00
XNAString::XNAAlphabetFrequency(obj = xnastring_obj, slot = 'base', letters = c('A', 'C'))
##      A    C   
## [1,] 1.00 0.00

8.0.2 Example 2: XNAString object

xnastring_obj <- XNAString(name = 'oligo1',
                           base = c('AAEC', 'ECTA'),
                           sugar = c('FFOO', 'DDLM'))

XNAString::XNAAlphabetFrequency(obj = xnastring_obj, slot = 'sugar', matrix_nbr =  2)
##      D    F    L    M    O    R   
## [1,] 2.00 0.00 1.00 1.00 0.00 0.00

8.0.3 Example 3: XNAStringSet object, single base, sugar and backbone

XNAString_obj1 <- XNAString(base = 'ATCG', sugar = 'FODD')
XNAString_obj2 <- XNAString(base = 'TTCT', sugar = 'FOLL')

XNAStringSet_obj <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))

XNAString::XNAAlphabetFrequency(obj = XNAStringSet_obj, slot = 'base')
##      A    C    E    G    T    U   
## [1,] 1.00 1.00 0.00 1.00 1.00 0.00
## [2,] 0.00 1.00 0.00 0.00 3.00 0.00

8.0.4 Example 4: XNAStringSet object, double base, sugar and backbone

XNAString_obj1 <- XNAString(base = c('ATCG', 'TAGC'),
                 sugar = c('OFOO', 'ODDF'),
                 backbone = c('SOS','SOS'))

XNAString_obj2 <- XNAString(base = c('GGCG', 'TATC'),
                 sugar = c('OOOO', 'OOFO'))

XNAStringSet_obj <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))

XNAString::XNAAlphabetFrequency(obj = XNAStringSet_obj, slot = 'sugar', matrix_nbr = 2)
##      D    F    L    M    O    R   
## [1,] 2.00 1.00 0.00 0.00 1.00 0.00
## [2,] 0.00 1.00 0.00 0.00 3.00 0.00

9 AlphabetFrequency for dinucleotides

Letters are tabulated and occurence frequency calculated for dinucleotides. There are 6 arguments:

  • obj (either XNAString and XNAStringSet object)

  • slot (‘base’, ‘sugar’ or ‘backbone’)

  • double_letters (frequency checked just for these double letters. If empty, all possible double letters from object’s dictionary taken as the default.)

  • matrix_nbr (1 is the default. If 1 - first slot’s element is use, if 2 - 2nd element in slot)

  • as.prob (default FALSE)

  • base_only (default FALSE. If TRUE, all possible double letters composed of: ‘A’, ‘C’, ‘G’, ‘T’ are tabulated)

9.0.1 Example 1: XNAString object

xnastring_obj <- XNAString(name = 'oligo1',
                           base = c('GCGC'),
                           sugar = c('FODL'),
                           target = DNAStringSet('TTTT'))

XNAString::XNADinucleotideFrequency(obj = xnastring_obj, slot = 'base')
##      AA   CA   EA   GA   TA   UA   AC   CC   EC   GC   TC   UC   AE   CE   EE  
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00
##      GE   TE   UE   AG   CG   EG   GG   TG   UG   AT   CT   ET   GT   TT   UT  
## [1,] 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
##      AU   CU   EU   GU   TU   UU  
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00
XNAString::XNADinucleotideFrequency(obj = xnastring_obj, slot = 'base', as.prob = TRUE)
##      AA   CA   EA   GA   TA   UA   AC   CC   EC   GC   TC   UC   AE   CE   EE  
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.67 0.00 0.00 0.00 0.00 0.00
##      GE   TE   UE   AG   CG   EG   GG   TG   UG   AT   CT   ET   GT   TT   UT  
## [1,] 0.00 0.00 0.00 0.00 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
##      AU   CU   EU   GU   TU   UU  
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00
XNAString::XNADinucleotideFrequency(obj = xnastring_obj, slot = 'base', base_only = TRUE)
##    AA    CA    GA    TA    AC    CC    GC    TC    AG    CG    GG    TG    AT 
##  0.00  0.00  0.00  0.00  0.00  0.00  2.00  0.00  0.00  1.00  0.00  0.00  0.00 
##    CT    GT    TT other 
##  0.00  0.00  0.00  0.00
XNAString::XNADinucleotideFrequency(obj = xnastring_obj, slot = 'base', double_letters = c('GC', 'CU'))
##      GC   CU  
## [1,] 2.00 0.00

9.0.2 Example 2: XNAString object with custom dictionary

my_dict <- data.table::data.table(type = c(rep('base', 3), rep('sugar', 2), rep('backbone', 2)),
                                  symbol = c('G', 'E', 'A', 'F', 'O', 'S', 'B'))

xnastring_obj <-      XNAString_obj1 <- XNAString(
  base = c('AGGE', 'EEEA'),
  sugar = c('FFFO', 'OOOO'),
  backbone = c('SBS', 'SBS'),
  dictionary = my_dict
)



XNAString::XNADinucleotideFrequency(obj = xnastring_obj,
                                    slot = 'sugar',
                                    matrix_nbr =  2)
##      FF   OF   FO   OO  
## [1,] 0.00 0.00 0.00 3.00

9.0.3 Example 3: XNAStringSet object, single base, sugar and backbone

XNAString_obj1 <- XNAString(base = 'ATCG', sugar = 'FODD')
XNAString_obj2 <- XNAString(base = 'TTCT', sugar = 'FOLL')

XNAStringSet_obj <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))

XNAString::XNADinucleotideFrequency(obj = XNAStringSet_obj, slot = 'base')
##      AA   CA   EA   GA   TA   UA   AC   CC   EC   GC   TC   UC   AE   CE   EE  
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00
## [2,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00
##      GE   TE   UE   AG   CG   EG   GG   TG   UG   AT   CT   ET   GT   TT   UT  
## [1,] 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00
## [2,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.00
##      AU   CU   EU   GU   TU   UU  
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00
## [2,] 0.00 0.00 0.00 0.00 0.00 0.00

9.0.4 Example 4: XNAStringSet object, double base, sugar and backbone

XNAString_obj1 <- XNAString(base = c('ATCG', 'TAGC'),
                 sugar = c('OFOO', 'ODDF'),
                 backbone = c('SOS','SOS'))

XNAString_obj2 <- XNAString(base = c('GGCG', 'TATC'),
                 sugar = c('OOOO', 'OOFO'))

XNAStringSet_obj <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))

XNAString::XNADinucleotideFrequency(obj = XNAStringSet_obj, slot = 'sugar', matrix_nbr = 2)
##      DD   FD   LD   MD   OD   RD   DF   FF   LF   MF   OF   RF   DL   FL   LL  
## [1,] 1.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## [2,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00
##      ML   OL   RL   DM   FM   LM   MM   OM   RM   DO   FO   LO   MO   OO   RO  
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## [2,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.00
##      DR   FR   LR   MR   OR   RR  
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00
## [2,] 0.00 0.00 0.00 0.00 0.00 0.00

10 Other functions

10.0.1 Example 1: mimir2XnaDict

dt <- data.table::data.table(HELM = c("([PPG])", "[fR]", "[srP]"),
                  TS_BASE_SEQ = c("F", NA, NA),
                  TS_SUGAR_SEQ = c(NA, NA, 'F'),
                  TS_BACKBONE_SEQ = c(NA, 'S', NA))
dt
##       HELM TS_BASE_SEQ TS_SUGAR_SEQ TS_BACKBONE_SEQ
## 1: ([PPG])           F         <NA>            <NA>
## 2:    [fR]        <NA>         <NA>               S
## 3:   [srP]        <NA>            F            <NA>
mimir2XnaDict(dt, 'TS_BASE_SEQ', 'TS_SUGAR_SEQ', 'TS_BACKBONE_SEQ')
##       HELM     type symbol
## 1: ([PPG])     base      F
## 2:   [srP]    sugar      F
## 3:    [fR] backbone      S

10.0.2 Example 2: concatDict

my_dict <- data.table::data.table(HELM = c('[[B]]'),
                                   type = c('base'),
                                   symbol = c('B'))
concatDict(my_dict)
##         HELM     type symbol
##  1:    [[B]]     base      B
##  2:        P backbone      O
##  3:    [PS2] backbone      2
##  4:  [Ppace] backbone      A
##  5:     [sP] backbone      X
##  6: [sPpace] backbone      B
##  7:    [srP] backbone      R
##  8:    [ssP] backbone      S
##  9:      (A)     base      A
## 10:      (C)     base      C
## 11:      (G)     base      G
## 12:      (T)     base      T
## 13: ([5meC])     base      E
## 14:      (U)     base      U
## 15:     [LR]    sugar      L
## 16:    [MOE]    sugar      M
## 17:      [R]    sugar      R
## 18:     [dR]    sugar      D
## 19:     [mR]    sugar      O
## 20:     [fR]    sugar      F

10.0.3 Example 3: concatDict with duplicates

my_dict <- data.table::data.table(HELM = c('[[U]]'),
                                   type = c('base'),
                                   symbol = c('U'))
concatDict(my_dict)
## Error in concatDict(my_dict): There is at least one duplicated symbol for the same type.

11 Exemplary workflows

11.1 Transform data saved in HELM notation to XNAString object. Then for a given subject find global alignment and match target slot (used as a pattern) to the subject.


helm <-
    "CHEM1{[5gn2c6]}|RNA1{P.[dR](C)P.[dR](A)P.[LR](G)[sP].[LR](A)[sP].[LR](G)[sP].[LR](A)[sP].[dR](A)[sP].[dR](G)[sP].[dR](G)[sP].[dR](C)[sP].[dR](A)[sP].[dR](C)[sP].[dR](A)[sP].[dR](G)[sP].[dR](A)[sP].[LR]([5meC])[sP].[LR](G)[sP].[LR](G)}$CHEM1,RNA1,1:R2-1:R1$$$V2.0"

xna_obj <- XNAStringFromHelm(helm, name ='oligo1')
xna_obj
#> XNAString object
#> name:       oligo1
#> base:       GAGAAGGCACAGAEGG
#> sugar:      LLLLDDDDDDDDDLLL
#> backbone:   XXXXXXXXXXXXXXX
#> target:     CCGTCTGTGCCTTCTC
#> conjugate5: [5gn2c6]
#> conjugate3: 
#> secondary_structure: ................, 0
#> duplex_structure: ......((........&......))........, -2.1

subject <- 'ATCGATATATATACACCGTCTGTGCCTTCTCACTACATCGAG'
substitutionMatrix <- Biostrings::nucleotideSubstitutionMatrix()
XNAString::XNAPairwiseAlignment(pattern = obj, subject = subject, substitutionMatrix = substitutionMatrix)
#> Global PairwiseAlignmentsSingleSubject (1 of 2)
#> pattern: TAG----------------CTATATATATGTGTACATACTAC
#> subject: ATCGATATATATACACCGTCTGTGCCTTCTCACTACATCGAG
#> score: -68

XNAString::XNAMatchPattern(pattern = xna_obj, subject = subject)
#> Views on a 42-letter BString subject
#> subject: ATCGATATATATACACCGTCTGTGCCTTCTCACTACATCGAG
#> views:
#>       start end width
#>   [1]    16  31    16 [CCGTCTGTGCCTTCTC]

11.2 Create XNAStringSet object by passing base list. Check alphabet frequency and dinucleotide frequency for this object.



base <- list(c('ATCGATAT', 'ATCGATAT'),  c('TGGGGGTGC', 'ATCGGGAT'), c('CCCTAGTA'))
                            
set_obj <- XNAStringSet(base = base)
set_obj
#> XNAStringSet object
#>    name               base              sugar         backbone    target
#> 1:   NA  ATCGATAT,ATCGATAT  DDDDDDDD,DDDDDDDD  OOOOOOO,OOOOOOO  ATATCGAT
#> 2:   NA TGGGGGTGC,ATCGGGAT DDDDDDDDD,DDDDDDDD OOOOOOOO,OOOOOOO GCACCCCCA
#> 3:   NA           CCCTAGTA           DDDDDDDD          OOOOOOO  TACTAGGG
#>    conjugate5 conjugate3 secondary_structure
#> 1:         NA         NA                    
#> 2:         NA         NA                    
#> 3:         NA         NA          ........,0

XNAAlphabetFrequency(obj = set_obj, slot = 'base')
#>      A    C    E    G    T    U   
#> [1,] 3.00 1.00 0.00 1.00 3.00 0.00
#> [2,] 0.00 1.00 0.00 6.00 2.00 0.00
#> [3,] 2.00 3.00 0.00 1.00 2.00 0.00
XNAAlphabetFrequency(obj = set_obj, slot = 'base', matrix_nbr = 2)
#>      A    C    E    G    T    U   
#> [1,] 3.00 1.00 0.00 1.00 3.00 0.00
#> [2,] 2.00 1.00 0.00 3.00 2.00 0.00
#> [3,] 0.00 0.00 0.00 0.00 0.00 0.00
XNAAlphabetFrequency(obj = set_obj, slot = 'base', as.prob = TRUE)
#>      A    C    E    G    T    U   
#> [1,] 0.38 0.12 0.00 0.12 0.38 0.00
#> [2,] 0.00 0.11 0.00 0.67 0.22 0.00
#> [3,] 0.25 0.38 0.00 0.12 0.25 0.00

XNADinucleotideFrequency(obj = set_obj, slot = 'base', double_letters = c('AT', 'GA', 'GT'))
#>      AT   GA   GT  
#> [1,] 3.00 1.00 0.00
#> [2,] 0.00 0.00 1.00
#> [3,] 0.00 0.00 1.00
XNADinucleotideFrequency(obj = set_obj, slot = 'base', base_only = TRUE)
#>      AA   CA   GA   TA   AC   CC   GC   TC   AG   CG   GG   TG   AT   CT   GT  
#> [1,] 0.00 0.00 1.00 1.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00 0.00 3.00 0.00 0.00
#> [2,] 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 4.00 2.00 0.00 0.00 1.00
#> [3,] 0.00 0.00 0.00 2.00 0.00 2.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 1.00
#>      TT   other
#> [1,] 0.00 0.00 
#> [2,] 0.00 0.00 
#> [3,] 0.00 0.00

11.3 Create XNAStringSet object with allowed GU wobbles in target

#create custom complementary dictonary with complementary bases coded as IUPAC 
compl_dict <- XNAString::complementary_bases
compl_dict[base == "G"]$target <- "Y"
compl_dict[base == "T"]$target <- "R" # if you have T in your base sequence 
compl_dict[base == "U"]$target <- "R" # if you have U in your base sequence 

compl_dict
#>    base target compl_target
#> 1:    A      T            A
#> 2:    C      G            C
#> 3:    G      Y            G
#> 4:    T      R            T
#> 5:    E      G            C
#> 6:    U      R            T

xna <- XNAString::XNAStringSet(base = c("ACGTACG", "CCCGTAC", "AATACTT"), compl_dict = compl_dict)
xna
#> XNAStringSet object
#>    name    base   sugar backbone
#> 1:   NA ACGTACG DDDDDDD   OOOOOO
#> 2:   NA CCCGTAC DDDDDDD   OOOOOO
#> 3:   NA AATACTT DDDDDDD   OOOOOO
#>                                                 target conjugate5 conjugate3
#> 1: CGTACGT,TGTACGT,CGTGCGT,TGTGCGT,CGTATGT,TGTATGT,...         NA         NA
#> 2:                     GTACGGG,GTGCGGG,GTATGGG,GTGTGGG         NA         NA
#> 3: AAGTATT,GAGTATT,AGGTATT,GGGTATT,AAGTGTT,GAGTGTT,...         NA         NA
#>    secondary_structure
#> 1:           .......,0
#> 2:           .......,0
#> 3:           .......,0

Session info

Here is the output of sessionInfo() on the system on which this document was compiled running pandoc 2.7.3:

#> R version 4.3.0 RC (2023-04-13 r84269)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] Biostrings_2.68.0   GenomeInfoDb_1.36.0 XVector_0.40.0     
#>  [4] IRanges_2.34.0      S4Vectors_0.38.0    BiocGenerics_0.46.0
#>  [7] XNAString_1.8.0     pander_0.6.5        knitr_1.42         
#> [10] BiocStyle_2.28.0   
#> 
#> loaded via a namespace (and not attached):
#>  [1] future_1.32.0               sass_0.4.5                 
#>  [3] bitops_1.0-7                stringi_1.7.12             
#>  [5] lattice_0.21-8              magrittr_2.0.3             
#>  [7] listenv_0.9.0               digest_0.6.31              
#>  [9] evaluate_0.20               grid_4.3.0                 
#> [11] bookdown_0.33               fastmap_1.1.1              
#> [13] jsonlite_1.8.4              Matrix_1.5-4               
#> [15] formattable_0.2.1           restfulr_0.0.15            
#> [17] BiocManager_1.30.20         BSgenome_1.68.0            
#> [19] XML_3.99-0.14               codetools_0.2-19           
#> [21] jquerylib_0.1.4             cli_3.6.1                  
#> [23] rlang_1.1.0                 crayon_1.5.2               
#> [25] Biobase_2.60.0              parallelly_1.35.0          
#> [27] future.apply_1.10.0         cachem_1.0.7               
#> [29] DelayedArray_0.26.0         yaml_2.3.7                 
#> [31] tools_4.3.0                 parallel_4.3.0             
#> [33] BiocParallel_1.34.0         GenomeInfoDbData_1.2.10    
#> [35] Rsamtools_2.16.0            SummarizedExperiment_1.30.0
#> [37] globals_0.16.2              vctrs_0.6.2                
#> [39] R6_2.5.1                    lifecycle_1.0.3            
#> [41] matrixStats_0.63.0          BiocIO_1.10.0              
#> [43] stringr_1.5.0               rtracklayer_1.60.0         
#> [45] zlibbioc_1.46.0             htmlwidgets_1.6.2          
#> [47] bslib_0.4.2                 glue_1.6.2                 
#> [49] data.table_1.14.8           Rcpp_1.0.10                
#> [51] xfun_0.39                   GenomicRanges_1.52.0       
#> [53] GenomicAlignments_1.36.0    MatrixGenerics_1.12.0      
#> [55] rjson_0.2.21                htmltools_0.5.5            
#> [57] rmarkdown_2.21              compiler_4.3.0             
#> [59] RCurl_1.98-1.12