sesame 1.6.0
SeSAMe provides a set of quality control steps.
The SeSAMe QC function returns an sesameQC
object which can be
directly printed onto the screen.
sesameQC(ssets[[1]])
##
## =======================
## = Intensities =
## =======================
## No. probes 485577
## mean (M/U) (in-band InfI): 5529.506
## mean (M+U) (in-band InfI): 11059.01
##
## -- Infinium II --
## No. probes: 350076 (72.095%)
## Mean Intensity: 5160.813
##
## -- Infinium I (Red) --
## No. probes: 89203 (18.371%)
## No. Probes Consistent Channel: 88799
## No. Porbes Swapped Channel: 162
## No. Probes Low Intensity: 242
## Mean Intensity (in-band): 6527.3
## Mean Intensity (out-of-band): 928.2117
##
## -- Infinium I (Grn) --
## No. probes: 46298 (9.535%)
## No. Probes Consistent Channel: 46000
## No. Probes Swapped Channel: 254
## No. Probes Low Intensity: 44
## Mean Intensity (in-band): 6394.865
## Mean Intensity (out-of-band): 640.0676
##
## =======================
## = Beta Values =
## =======================
## No. probes: 485577
## No. probes with NA: 79778 (16.430%)
## Mean Betas: 0.5098457
## Median Betas: 0.6276011
##
## -- cg probes --
## No. Probes: 482421
## No. Probes with NA: 78429 (16.257%)
## Mean Betas: 0.511757
## Median Betas: 0.632708
## % Unmethylated (Beta < 0.3): 40.096%
## % Methylated (Beta > 0.7): 46.652%
##
## -- ch probes --
## No. Probes: 3091
## No. Probes with NA: 1346 (43.546%)
## Mean Betas: 0.06744853
## Median Betas: 0.06199575
## % Unmethylated (Beta < 0.3): 100.000%
## % Methylated (Beta > 0.7): 0.000%
##
## -- rs probes --
## No. Probes: 65
## No. Probes with NA: 3 (4.615%)
## Mean Betas: 0.5071412
## Median Betas: 0.53247
## % Unmethylated (Beta < 0.3): 32.258%
## % Methylated (Beta > 0.7): 30.645%
##
## =======================
## = Inferences =
## =======================
## Sex: MALE
## Ethnicity: WHITE
## Age: 63.23934
## Bisulfite Conversion (GCT): 1.10858
The sesameQC
object can be coerced into data.frame and linked
using the following code
qc10 <- do.call(rbind, lapply(ssets, function(x)
as.data.frame(sesameQC(x))))
qc10$sample_name <- names(ssets)
qc10[,c('mean_beta_cg','frac_meth_cg','frac_unmeth_cg','sex','age')]
## mean_beta_cg frac_meth_cg frac_unmeth_cg sex age
## WB_105 0.5117570 46.65167 40.09634 MALE 63.23934
## WB_218 0.5096306 47.54133 41.45205 MALE 43.28624
## WB_261 0.5168121 48.05370 40.86836 MALE 27.10038
## PBMC_105 0.5193698 46.53442 38.80828 MALE 56.42502
## PBMC_218 0.5215361 48.19021 39.96411 MALE 39.07262
## PBMC_261 0.5273707 49.45255 39.89300 MALE 24.54418
## Gran_105 0.4995210 46.78360 42.87791 MALE 63.17076
## Gran_218 0.4995041 46.92646 43.08087 MALE 43.51071
## Gran_261 0.5079122 47.80999 42.74814 MALE 27.66234
## CD4+_105 0.5218067 47.66358 39.19202 MALE 48.97698
The background level is given by mean_oob_grn
and mean_oob_red
library(ggplot2)
ggplot(qc10,
aes(x = mean_oob_grn, y= mean_oob_red, label = sample_name)) +
geom_point() + geom_text(hjust = -0.1, vjust = 0.1) +
geom_abline(intercept = 0, slope = 1, linetype = 'dotted') +
xlab('Green Background') + ylab('Red Background') +
xlim(c(500,1200)) + ylim(c(500,1200))
The mean {M,U} intensity can be reached by mean_intensity
.
Similarly, the mean M+U intensity can be reached by
mean_intensity_total
. Low intensities are symptomatic of low
input or poor hybridization.
library(wheatmap)
p1 <- ggplot(qc10) +
geom_bar(aes(sample_name, mean_intensity), stat='identity') +
xlab('Sample Name') + ylab('Mean Intensity') +
ylim(0,18000) +
theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
p2 <- ggplot(qc10) +
geom_bar(aes(sample_name, mean_intensity_total), stat='identity') +
xlab('Sample Name') + ylab('Mean M+U Intensity') +
ylim(0,18000) +
theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
WGG(p1) + WGG(p2, RightOf())
The fraction of color channel switch can be found in
InfI_switch_G2R
and InfI_switch_R2G
. These numbers are
symptomatic of how Infinium I probes are affected by SNP-induced
color channel switching.
ggplot(qc10) +
geom_point(aes(InfI_switch_G2R, InfI_switch_R2G))
The fraction of NAs are signs of masking due to variety of reasons
including failed detection, high background, putative low quality
probes etc. This number can be reached in frac_na_cg
and
num_na_cg
(the cg stands for CpG probes, so we also have
num_na_ch
and num_na_rs
)
p1 <- ggplot(qc10) +
geom_bar(aes(sample_name, num_na_cg), stat='identity') +
xlab('Sample Name') + ylab('Number of NAs') +
theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
p2 <- ggplot(qc10) +
geom_bar(aes(sample_name, frac_na_cg), stat='identity') +
xlab('Sample Name') + ylab('Fraction of NAs (%)') +
theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
WGG(p1) + WGG(p2, RightOf())