Sesame provide convenient function to compare your sample with public data sets processed with the same pipeline. All you need is a raw SigSet.
SeSAMe provides a set of quality control steps. Be sure to pre-cache HM450 annotation data from ExperimentHub. This only needs to be done once per sesame installation.
# library(FlowSorted.Blood.450k)
# ssets <- RGChannelSetToSigSets(FlowSorted.Blood.450k[,1:10])
ssets = sesameDataGet("EPIC.5.normal")
The SeSAMe QC function returns an sesameQC
object which can be directly printed onto the screen.
##
## =======================
## = Intensities =
## =======================
## No. probes (num_probes_all) 866895
## mean (M/U) (mean_intensity): 2657.248
## mean (M+U) (mean_intensity_total): 5314.496
##
## -- Infinium II --
## No. probes: (num_probes_II) 724612 (83.587%)
## Mean Intensity (mean_ii): 2526.561
##
## -- Infinium I (Red) --
## No. probes: (num_probes_IR) 92294 (10.647%)
## No. Probes Consistent Channel: 92045
## No. Porbes Swapped Channel: 105
## No. Probes Low Intensity: 144
## Mean Intensity (in-band): 3579.385
## Mean Intensity (out-of-band): 524.6618
##
## -- Infinium I (Grn) --
## No. probes: 49989 (5.766%)
## No. Probes Consistent Channel: 49276
## No. Probes Swapped Channel: 630
## No. Probes Low Intensity: 83
## Mean Intensity (in-band): 2849.083
## Mean Intensity (out-of-band): 303.2474
##
## =======================
## = Non-detection =
## =======================
## No. probes: 33607
## No. probes w/ NA (num_na): 33607 (3.877%)
## No. nondetection (num_nondt): 33607 (3.877%)
##
## =======================
## = Beta Values =
## =======================
## Mean Betas: 0.57629
## Median Betas: 0.7269904
##
## -- cg probes --
## No. Probes: 863904
## No. Probes with NA: 33054 (3.826%)
## Mean Betas: 0.5777363
## Median Betas: 0.7290963
## % Unmethylated (Beta < 0.3): 30.803%
## % Methylated (Beta > 0.7): 51.886%
##
## -- ch probes --
## No. Probes: 2932
## No. Probes with NA: 550 (18.759%)
## Mean Betas: 0.07318857
## Median Betas: 0.06107732
## % Unmethylated (Beta < 0.3): 99.874%
## % Methylated (Beta > 0.7): 0.000%
##
## -- rs probes --
## No. Probes: 59
## No. Probes with NA: 3 (5.085%)
## Mean Betas: 0.518126
## Median Betas: 0.5150823
## % Unmethylated (Beta < 0.3): 30.357%
## % Methylated (Beta > 0.7): 33.929%
The sesameQC
object can be coerced into data.frame and linked using the following code
qc5 <- do.call(rbind, lapply(ssets, function(x)
as.data.frame(sesameQC(x))))
qc5$sample_name <- names(ssets)
qc5[,c('mean_beta_cg','frac_meth_cg','frac_unmeth_cg')]
The background level is given by mean_oob_grn
and mean_oob_red
The mean {M,U} intensity can be reached by mean_intensity
. Similarly, the mean M+U intensity can be reached by mean_intensity_total
. Low intensities are symptomatic of low input or poor hybridization.
library(wheatmap)
p1 <- ggplot(qc5) +
geom_bar(aes(sample_name, mean_intensity), stat='identity') +
xlab('Sample Name') + ylab('Mean Intensity') +
ylim(0,18000) +
theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
p2 <- ggplot(qc5) +
geom_bar(aes(sample_name, mean_intensity_total), stat='identity') +
xlab('Sample Name') + ylab('Mean M+U Intensity') +
ylim(0,18000) +
theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
WGG(p1) + WGG(p2, RightOf())
The fraction of NAs are signs of masking due to variety of reasons including failed detection, high background, putative low quality probes etc. This number can be reached in frac_na_cg
and num_na_cg
(the cg stands for CpG probes, so we also have num_na_ch
and num_na_rs
)
p1 <- ggplot(qc5) +
geom_bar(aes(sample_name, num_na_cg), stat='identity') +
xlab('Sample Name') + ylab('Number of NAs') +
theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
p2 <- ggplot(qc5) +
geom_bar(aes(sample_name, frac_na_cg), stat='identity') +
xlab('Sample Name') + ylab('Fraction of NAs (%)') +
theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
WGG(p1) + WGG(p2, RightOf())
The fraction of color channel switch can be found in InfI_switch_G2R
and InfI_switch_R2G
. These numbers are symptomatic of how Infinium I probes are affected by SNP-induced color channel switching.
Infinium platforms are intrinsically robust to incomplete bisulfite conversion as non-converted probes would fail to hybridize to the target. Residual incomplete bisulfite conversion can be quantified using GCT score based on C/T-extension probes. Details of this method can be found in Zhou et al. 2017. The closer the score to 1.0, the more complete the bisulfite conversion.
## [1] 1.067894
SeSAMe can output explicit and Infinium-I-derived SNP to VCF. This information can be used to identify sample swaps.
## Retrieving annotation from https://zhouserver.research.chop.edu/InfiniumAnnotation/current/EPIC/EPIC.hg19.snp_overlap_b151.rds ... Done.
## Retrieving annotation from https://zhouserver.research.chop.edu/InfiniumAnnotation/current/EPIC/EPIC.hg19.typeI_overlap_b151.rds ... Done.
One can output to actual VCF file with a header by formatVCF(sset, vcf=path_to_vcf)
.