Mouse MM285 Array

Preprocessing

To begin, we need to retrieve mouse annotation data from ExperimentHub. This only needs to be done once per sesame installation.

## [1] TRUE

SeSAMe provides extensive native support for the Illumina mouse array (referred to as the MM285 array). The MM285 contains ~285,000 probes covering over 20 design categories including gene promoters, enhancers, CpGs in synteny to human EPIC array as well as other biology. This documents describe the procedure to process the MM285 array.

Let’s download an example mouse array IDAT

res_grn = sesameDataDownload("204637490002_R05C01_Grn.idat")
res_red = sesameDataDownload("204637490002_R05C01_Red.idat")
pfx = sprintf("%s/204637490002_R05C01", res_red$dest_dir)

To load IDAT into SigSet, one needs the readIDATpair function,

sdf = readIDATpair(pfx)

The default openSesame pipeline works for the mouse array

openSesame(idat_dir)

Let’s load a pre-built SigSet object

Preprocess the sigset to produce beta values. The standard noob, dyeBiasCorrTypeINorm works as expected:

sdf_normalized = sdf %>%
                 qualityMask %>%
                 pOOBAH %>%
                 noob %>%
                 dyeBiasCorrTypeINorm

Retrieve beta values using the following commands

betas = getBetas(sdf_normalized)

By default the repeat and suboptimally designed probes are masked by NA. Starting from mouse array, the suboptimally designed probes take a new probe ID prefix (“uk”) instead of the “cg”/“ch”/“rs” typically seen in the human array.

sum(is.na(betas))
## [1] 18386
head(betas[grep('uk', names(betas))])
##      uk3449_TC11     uk23380_TC11      uk9667_TC11     uk10501_BC11 
##               NA               NA               NA               NA 
##     uk20801_BC11 uk597441685_BC11 
##               NA               NA

To use these probes, one skip qualityMask and explicitly perform masking based on detection p-values only:

betas = sdf_normalized %>%
        setMask(pOOBAH(qualityMask(sdf), return.pval=TRUE)>0.05) %>%
        getBetas
sum(is.na(betas))
## [1] 7294
head(betas[grep('uk', names(betas))])
##      uk3449_TC11     uk23380_TC11      uk9667_TC11     uk10501_BC11 
##               NA        0.4579850               NA        0.1877392 
##     uk20801_BC11 uk597441685_BC11 
##        0.7557017               NA

Not that we still use qualityMask for calculating p-values. In this example, probes are only masked because of insignificant detection p-value One can completely turn off all masking by toggling off the mask option in getBetas:

betas = getBetas(sdf_normalized, mask = FALSE)
sum(is.na(betas))
## [1] 1

or reset the mask using resetMask function

betas = getBetas(resetMask(sdf_normalized))
sum(is.na(betas))
## [1] 1

Track View

betas = sesameDataGet("MM285.10.tissue")$betas
visualizeGene("Igf2", betas = betas, platform="MM285", refversion = "mm10")

Species, Strain Inference

Let’s load a pre-built SigSet object from SeSAMeData

sdf <- sesameDataGet("MM285.1.SigDF")

Infer species (obviously this is a mouse, but this is supposed to work on rat, human etc). Currently this feature supports both the mouse array and the mammal array.

inferSpecies(sdf)$species
## [1] "Mus musculus"

Calculate beta values using the following commands.

betas <- sdf %>%
         noob %>%
         dyeBiasCorrTypeINorm %>%
         getBetas

Infer strain information for mouse array. This will return a list containing the best guess, p-value of the best guess, and probabilities of all strains. Internally, the function converts the beta values to variant allele frequencies. It should be noted that since variant allele frequency is not always measured as the M-allele, one needs to flip the beta values for some probes to calculate variant allele frequency.

strain <- inferStrain(betas)
strain$pval
##   NOD_ShiLtJ 
## 5.123234e-09

Let’s visualize the probabilities of other strains.

library(ggplot2)
df <- data.frame(strain=names(strain$probs), probs=strain$probs)
ggplot(data = df,  aes(x = strain, y = log(probs))) +
  geom_bar(stat = "identity", color="gray") +
  ggtitle("strain probabilities") +
  scale_x_discrete(position = "top") +
  theme(axis.text.x = element_text(angle = 90), legend.position = "none")

Tissue Type Inference

Let’s load beta values from SeSAMeData

betas <- sesameDataGet("MM285.10.tissue")$betas[,1:2]

Compare mouse array data with mouse tissue references. This will return a grid object that contrasts the traget sample with pre-build mouse tissue reference.

options(sesameData_use_alternative=TRUE)
compareMouseTissueReference(betas)

Sex Inference

Let’s load beta values from sesameData

sdf = sesameDataGet("MM285.1.SigDF")

Sex inference can take both the raw signal in SigDF or beta value vector

inferSex(sdf)
## Loading required package: e1071
## [1] Male
## Levels: Female Male

Age Inference

Let’s load beta values from SeSAMeData

betas <- sesameDataGet('MM285.10.tissue')$betas

The age of the mouse can be predicted using the predictMouseAgeInMonth function. This looks for overlapping probes and estimates age using an aging model built from 347 MM285 probes. The function returns a numeric output of age in months. The model is most accurate with SeSAMe preprocessing. Here’s an example.

predictMouseAgeInMonth(betas[,1])
## [1] 1.413134

This indicates thaat this mouse is approximately 1.41 months old.

Human-Mouse Mixture

UNDER CONSTRUCTION

Horvath Mammal40 Array

SeSAMe supports Mammal 40 array natively.

## [1] TRUE
res_grn = sesameDataDownload("GSM4411982_Grn.idat.gz")
res_red = sesameDataDownload("GSM4411982_Red.idat.gz")
sdf = readIDATpair(sprintf("%s/GSM4411982", res_red$dest_dir))

Preprocess the sigset to produce beta values. The standard noob, dyeBiasCorrTypeINorm works as expected:

sdf_normalized = dyeBiasCorrTypeINorm(noob(pOOBAH(qualityMask(sdf))))

Retrieve beta values using the following commands

betas = getBetas(sdf_normalized)
head(betas)
## cg00000165 cg00001209 cg00001364 cg00001582 cg00002920 cg00003994 
##         NA 0.51280570 0.91460519 0.06570304 0.50046331 0.04526205

Session Info

sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] e1071_1.7-9                 tidyr_1.2.0                
##  [3] dplyr_1.0.7                 knitr_1.37                 
##  [5] SummarizedExperiment_1.24.0 Biobase_2.54.0             
##  [7] MatrixGenerics_1.6.0        matrixStats_0.61.0         
##  [9] scales_1.1.1                DNAcopy_1.68.0             
## [11] GenomicRanges_1.46.1        GenomeInfoDb_1.30.1        
## [13] IRanges_2.28.0              S4Vectors_0.32.3           
## [15] wheatmap_0.1.0              ggplot2_3.3.5              
## [17] sesame_1.12.8               sesameData_1.12.0          
## [19] rmarkdown_2.11              ExperimentHub_2.2.1        
## [21] AnnotationHub_3.2.1         BiocFileCache_2.2.1        
## [23] dbplyr_2.1.1                BiocGenerics_0.40.0        
## 
## loaded via a namespace (and not attached):
##  [1] fgsea_1.20.0                  colorspace_2.0-2             
##  [3] ellipsis_0.3.2                class_7.3-20                 
##  [5] XVector_0.34.0                base64_2.0                   
##  [7] proxy_0.4-26                  farver_2.1.0                 
##  [9] ggrepel_0.9.1                 bit64_4.0.5                  
## [11] interactiveDisplayBase_1.32.0 AnnotationDbi_1.56.2         
## [13] fansi_1.0.2                   splines_4.1.2                
## [15] cachem_1.0.6                  jsonlite_1.7.3               
## [17] png_0.1-7                     shiny_1.7.1                  
## [19] BiocManager_1.30.16           compiler_4.1.2               
## [21] httr_1.4.2                    assertthat_0.2.1             
## [23] Matrix_1.4-0                  fastmap_1.1.0                
## [25] cli_3.1.1                     later_1.3.0                  
## [27] htmltools_0.5.2               tools_4.1.2                  
## [29] gtable_0.3.0                  glue_1.6.1                   
## [31] GenomeInfoDbData_1.2.7        reshape2_1.4.4               
## [33] rappdirs_0.3.3                fastmatch_1.1-3              
## [35] Rcpp_1.0.8                    jquerylib_0.1.4              
## [37] vctrs_0.3.8                   Biostrings_2.62.0            
## [39] nlme_3.1-155                  preprocessCore_1.56.0        
## [41] xfun_0.29                     stringr_1.4.0                
## [43] mime_0.12                     lifecycle_1.0.1              
## [45] zlibbioc_1.40.0               MASS_7.3-55                  
## [47] BiocStyle_2.22.0              promises_1.2.0.1             
## [49] parallel_4.1.2                RColorBrewer_1.1-2           
## [51] yaml_2.2.2                    curl_4.3.2                   
## [53] memoise_2.0.1                 gridExtra_2.3                
## [55] sass_0.4.0                    stringi_1.7.6                
## [57] RSQLite_2.2.9                 BiocVersion_3.14.0           
## [59] highr_0.9                     randomForest_4.7-1           
## [61] filelock_1.0.2                BiocParallel_1.28.3          
## [63] rlang_1.0.1                   pkgconfig_2.0.3              
## [65] bitops_1.0-7                  evaluate_0.14                
## [67] lattice_0.20-45               purrr_0.3.4                  
## [69] labeling_0.4.2                bit_4.0.4                    
## [71] tidyselect_1.1.1              plyr_1.8.6                   
## [73] magrittr_2.0.2                R6_2.5.1                     
## [75] generics_0.1.2                DelayedArray_0.20.0          
## [77] DBI_1.1.2                     mgcv_1.8-38                  
## [79] pillar_1.7.0                  withr_2.4.3                  
## [81] KEGGREST_1.34.0               RCurl_1.98-1.5               
## [83] tibble_3.1.6                  crayon_1.4.2                 
## [85] KernSmooth_2.23-20            utf8_1.2.2                   
## [87] grid_4.1.2                    data.table_1.14.2            
## [89] blob_1.2.2                    digest_0.6.29                
## [91] xtable_1.8-4                  httpuv_1.6.5                 
## [93] illuminaio_0.36.0             openssl_1.4.6                
## [95] munsell_0.5.0                 bslib_0.3.1                  
## [97] askpass_1.1