Load IDATs, preprocessing and masking

Cache mouse array annotation data

To begin, we need to retrieve mouse annotation data from ExperimentHub. This only needs to be done once per sesame installation.

sesameDataCache("MM285")
## snapshotDate(): 2021-04-27
## [1] TRUE

Read IDATs

SeSAMe provides extensive native support for the Illumina mouse array (referred to as the MM285 array). The MM285 contains ~285,000 probes covering over 20 design categories including gene promoters, enhancers, CpGs in synteny to human EPIC array as well as other biology. This documents describe the procedure to process the MM285 array.

Let’s download an example mouse array IDAT

res_grn = sesameDataDownload("204637490002_R05C01_Grn.idat")
res_red = sesameDataDownload("204637490002_R05C01_Red.idat")
pfx = sprintf("%s/204637490002_R05C01", res_red$dest_dir)

To load IDAT into SigSet, one needs the readIDATpair function,

sset = readIDATpair(pfx)

The default openSesame pipeline works for the mouse array

openSesame(idat_dir)

Preprocessing

Let’s load a pre-built SigSet object

Preprocess the sigset to produce beta values. The standard noob, dyeBiasCorrTypeINorm works as expected:

sset_normalized = sset %>%
                  noob %>%
                  dyeBiasCorrTypeINorm

Retrieve beta values using the following commands

betas = sset_normalized %>%
        qualityMask %>%
        detectionMask %>%
        getBetas

By default the repeat and suboptimally designed probes are masked by NA. Starting from mouse array, the suboptimally designed probes take a new probe ID prefix (“uk”) instead of the “cg”/“ch”/“rs” typically seen in the human array.

sum(is.na(betas))
## [1] 23039
head(betas[grep('uk', names(betas))])
##  uk-1-101008622_BC11 uk-11-118757022_TC11   uk-11-4183579_TC11 
##                   NA                   NA                   NA 
##  uk-11-72715146_BC11  uk-12-45422860_BC11  uk-12-49764082_BC11 
##                   NA                   NA           0.02095679

To use these probes, one skip qualityMask:

betas = sset_normalized %>%
        detectionMask %>%
        getBetas
sum(is.na(betas))
## [1] 23039
head(betas[grep('uk', names(betas))])
##  uk-1-101008622_BC11 uk-11-118757022_TC11   uk-11-4183579_TC11 
##                   NA                   NA                   NA 
##  uk-11-72715146_BC11  uk-12-45422860_BC11  uk-12-49764082_BC11 
##                   NA                   NA           0.02095679

Note that probes can still be masked because of insignificant detection p-value One can completely turn off masking by skipping that

betas = sset_normalized %>% getBetas
sum(is.na(betas))
## [1] 17514

or use mask=FALSE in the getBetas function.

betas = sset_normalized %>%
        qualityMask %>%
        detectionMask %>%
        getBetas(mask = FALSE)
sum(is.na(betas))
## [1] 1

Visualize mouse array betas

betas = sesameDataGet("MM285.10.tissue")$betas
visualizeGene("Igf2", betas = betas, platform="MM285", refversion = "mm10")

Infer Strain Information

Let’s load a pre-built SigSet object from SeSAMeData

sset <- sesameDataGet('MM285.1.NOD.FrontalLobe')

Calculate beta values using the following commands.

betas <- sset %>%
         noob %>%
         dyeBiasCorrTypeINorm %>%
         getBetas

Convert the beta values to Variant Allele Frequencies.
It should be noted that since variant allele frequency is not always measured in green for Infinium-II and M-allele for Infinium-I, one needs to flip the beta values for some probes to calculate variant allele frequency.

vafs <- betaToAF(betas)

Infer strain information for mouse array. This will return a list containing the best guess, p-value of the best guess, and probabilities of all strains.

strain <- inferStrain(vafs)
strain$pval
##   NOD_ShiLtJ 
## 4.868143e-09

Let’s visualize the probabilities of other strains.

library(ggplot2)
df <- data.frame(strain=names(strain$probs), probs=strain$probs)
ggplot(data = df,  aes(x = strain, y = log(probs))) +
  geom_bar(stat = "identity", color="gray") +
  ggtitle("strain probabilities") +
  scale_x_discrete(position = "top") +
  theme(axis.text.x = element_text(angle = 90), legend.position = "none")

Contrast Data with Tissue References

Let’s load beta values from SeSAMeData

betas <- sesameDataGet("MM285.10.tissue")$betas[,1:2]

Compare mouse array data with mouse tissue references. This will return a grid object that contrasts the traget sample with pre-build mouse tissue reference.

compareMouseTissueReference(betas)

Infer Mouse Age

Let’s load beta values from SeSAMeData

betas <- sesameDataGet('MM285.10.tissue')$betas

The age of the mouse can be predicted using the predictMouseAgeInMonth function. This looks for overlapping probes and estimates age using an aging model built from 347 MM285 probes. The function returns a numeric output of age in months. The model is most accurate with SeSAMe preprocessing. Here’s an example.

predictMouseAgeInMonth(betas[,1])
## [1] 1.413134

This indicates thaat this mouse is approximately 1.41 months old.

Differential Methylation

library(SummarizedExperiment)
se = sesameDataGet("MM285.10.tissues")[1:100,]
se_ok = (checkLevels(assay(se), colData(se)$sex) &
    checkLevels(assay(se), colData(se)$tissue))
se = se[se_ok,]

Test differential methyaltion on a model with tissue and sex as covariates.

cf_list = summaryExtractCfList(DML(se, ~tissue + sex))

Testing sex-specific differential methylation yields chrX-linked probes.

cf_list = DMR(se, cf_list$sexMale)
## Merging correlated CpGs ... Done.
## Generated 58 segments.
## Combine p-values ... 
##  - 3 significant segments.
##  - 2 significant segments (after BH).
## Done.
topSegments(cf_list) %>% dplyr::filter(Seg.Pval.adj < 0.05)