SeSAMe implements inference of sex, age, ethnicity. These are valuable information for checking the integrity of the experiment and detecting sample swaps.

Sex, XCI

Sex is inferred based on our curated X-linked probes and Y chromosome probes excluding pseudo-autosomal regions and XCI escapes.

Human:

## [1] "MALE"
## [1] "XaY"

Mouse:

## [1] Male
## Levels: Female Male

Ethnicity

Ethnicity is inferred using a random forest model trained based on both the built-in SNPs (rs probes) and channel-switching Type-I probes.

## [1] "WHITE"

Age & Epigenetic Clock

Human

SeSAMe provides age regression a la the well-known Horvath 353 model (see Horvath 2013)

## [1] 84.13913

Mouse

SeSAMe provides age estimation using a set of 347 CpGs (see Zhou et al. 2022)

The age of the mouse can be predicted using the predictMouseAgeInMonth function. This looks for overlapping probes and estimates age using an aging model built from 347 MM285 probes. The function returns a numeric output of age in months. The model is most accurate with SeSAMe preprocessing. Here’s an example.

## [1] 1.413134

This indicates thaat this mouse is approximately 1.41 months old.

Copy Number

See Supplemental Vignette

Cell Count Deconvolution

SeSAMe estimates leukocyte fraction using a two-component model.This function works for samples whose targeted cell-of-origin is not related to white blood cells.

## [1] 0.2007592

Genomic Privacy

The goal of data sanitization is to modifiy IDAT files in place, so they can be released to public domain without privacy leak. This will be achieved by deIdentification.

Let’s take DNA methylation data from the HM450 platform for example.

De-identify by Masking

This first method of deIdentification masks SNP probe intensity mean by zero. As a consequence, the allele frequency will be 0.5.

Note that before deIdentify, the rs values will all be different. After deIdentify, the rs values will all be masked at an intensity of 0.5.

Session Info

## R version 4.2.0 (2022-04-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] SummarizedExperiment_1.26.1 Biobase_2.56.0             
##  [3] GenomicRanges_1.48.0        GenomeInfoDb_1.32.2        
##  [5] IRanges_2.30.0              S4Vectors_0.34.0           
##  [7] MatrixGenerics_1.8.0        matrixStats_0.62.0         
##  [9] knitr_1.39                  sesame_1.14.2              
## [11] sesameData_1.14.0           ExperimentHub_2.4.0        
## [13] AnnotationHub_3.4.0         BiocFileCache_2.4.0        
## [15] dbplyr_2.1.1                BiocGenerics_0.42.0        
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-7                  bit64_4.0.5                  
##  [3] filelock_1.0.2                RColorBrewer_1.1-3           
##  [5] httr_1.4.3                    tools_4.2.0                  
##  [7] bslib_0.3.1                   utf8_1.2.2                   
##  [9] R6_2.5.1                      DBI_1.1.2                    
## [11] colorspace_2.0-3              withr_2.5.0                  
## [13] tidyselect_1.1.2              preprocessCore_1.58.0        
## [15] bit_4.0.4                     curl_4.3.2                   
## [17] compiler_4.2.0                cli_3.3.0                    
## [19] DelayedArray_0.22.0           labeling_0.4.2               
## [21] sass_0.4.1                    scales_1.2.0                 
## [23] randomForest_4.7-1            readr_2.1.2                  
## [25] proxy_0.4-26                  rappdirs_0.3.3               
## [27] stringr_1.4.0                 digest_0.6.29                
## [29] rmarkdown_2.14                XVector_0.36.0               
## [31] pkgconfig_2.0.3               htmltools_0.5.2              
## [33] highr_0.9                     fastmap_1.1.0                
## [35] rlang_1.0.2                   RSQLite_2.2.14               
## [37] shiny_1.7.1                   farver_2.1.0                 
## [39] jquerylib_0.1.4               generics_0.1.2               
## [41] jsonlite_1.8.0                wheatmap_0.2.0               
## [43] BiocParallel_1.30.2           dplyr_1.0.9                  
## [45] RCurl_1.98-1.6                magrittr_2.0.3               
## [47] GenomeInfoDbData_1.2.8        Matrix_1.4-1                 
## [49] Rcpp_1.0.8.3                  munsell_0.5.0                
## [51] fansi_1.0.3                   lifecycle_1.0.1              
## [53] stringi_1.7.6                 yaml_2.3.5                   
## [55] MASS_7.3-57                   zlibbioc_1.42.0              
## [57] plyr_1.8.7                    grid_4.2.0                   
## [59] blob_1.2.3                    ggrepel_0.9.1                
## [61] parallel_4.2.0                promises_1.2.0.1             
## [63] crayon_1.5.1                  lattice_0.20-45              
## [65] Biostrings_2.64.0             hms_1.1.1                    
## [67] KEGGREST_1.36.0               pillar_1.7.0                 
## [69] reshape2_1.4.4                glue_1.6.2                   
## [71] BiocVersion_3.15.2            evaluate_0.15                
## [73] BiocManager_1.30.18           png_0.1-7                    
## [75] vctrs_0.4.1                   tzdb_0.3.0                   
## [77] httpuv_1.6.5                  gtable_0.3.0                 
## [79] purrr_0.3.4                   assertthat_0.2.1             
## [81] cachem_1.0.6                  ggplot2_3.3.6                
## [83] xfun_0.31                     mime_0.12                    
## [85] xtable_1.8-4                  e1071_1.7-9                  
## [87] later_1.3.0                   class_7.3-20                 
## [89] tibble_3.1.7                  AnnotationDbi_1.58.0         
## [91] memoise_2.0.1                 ellipsis_0.3.2               
## [93] interactiveDisplayBase_1.34.0 BiocStyle_2.24.0