FlowSorted.Blood.EPIC

Illumina Human Methylation data from EPIC on immunomagnetic sorted adult blood cell populations. The FlowSorted.Blood.EPIC package contains Illumina HumanMethylationEPIC (EPIC)) DNA methylation microarray data from the immunomethylomics group (manuscript submitted), consisting of 37 magnetic sorted blood cell references and 12 samples, formatted as an RGChannelSet object for integration and normalization using most of the existing Bioconductor packages.

This package contains data similar to the FlowSorted.Blood.450k package consisting of data from peripheral blood samples generated from adult men and women. However, when using the newer EPIC microarray minfi estimates of cell type composition using the FlowSorted.Blood.450k package are less precise compared to actual cell counts. Hence, this package consists of appropriate data for deconvolution of adult blood samples used in for example EWAS relying in the newer EPIC technology.

Researchers may find this package useful as these samples represent different cellular populations ( T lymphocytes (CD4+ and CD8+), B cells (CD19+), monocytes (CD14+), NK cells (CD56+) and Neutrophils of cell sorted blood generated with high purity estimates. As a test of accuracy 12 experimental mixtures were reconstructed using fixed amounts of DNA from purified cells.

Objects included:
1. FlowSorted.Blood.EPIC is the RGChannelSet object containing the reference library

library(ExperimentHub)  

hub <- ExperimentHub()  

query(hub, "FlowSorted.Blood.EPIC")  

FlowSorted.Blood.EPIC <- hub[["EH1136"]]  

FlowSorted.Blood.EPIC

head(IDOLOptimizedCpGs)

head(IDOLOptimizedCpGs450klegacy)

We offer the function estimateCellCounts2 a modification of the popular estimatesCellCounts function in minfi. Our function corrected small glitches when dealing with combining the DataFrame objects with the reference libraries. We allow the use of MethylSet objects such as those from GEO. However, we offer only Quantile normalization for those datasets (assuming that they have not been previously normalized). The estimates are calculated using the CP/QP method described in Houseman et al. 2012. and adapted in minfi. CIBERSORT and RPC are allowed using external packages.
see ?estimateCellCounts2 for details

# Step 1: Load the reference library to extract the artificial mixtures  

library(ExperimentHub)  
hub <- ExperimentHub()  
query(hub, "FlowSorted.Blood.EPIC")  
FlowSorted.Blood.EPIC <- hub[["EH1136"]]  
FlowSorted.Blood.EPIC  

# Step 2 separate the reference from the testing dataset  

RGsetTargets <- FlowSorted.Blood.EPIC[,
             FlowSorted.Blood.EPIC$CellType == "MIX"]  
             
sampleNames(RGsetTargets) <- paste(RGsetTargets$CellType,
                            seq_len(dim(RGsetTargets)[2]), sep = "_")  
RGsetTargets  

# Step 3: Deconvolute using the IDOL L-DMR  

head (IDOLOptimizedCpGs)  

# If you need to deconvolute a 450k legacy dataset use 
# IDOLOptimizedCpGs450klegacy instead  

# Do not run with limited RAM the normalization step requires a big amount of 
# memory resources  

if (memory.limit()>8000){  
 countsEPIC<-estimateCellCounts2(RGsetTargets, compositeCellType = "Blood",   
                                processMethod = "preprocessNoob",  
                                probeSelect = "IDOL",  
                                cellTypes = c("CD8T", "CD4T", "NK", "Bcell",  
                                "Mono", "Neu"),  
                                referencePlatform =   
                                "IlluminaHumanMethylationEPIC",  
                                referenceset = NULL,  
                                IDOLOptimizedCpGs =IDOLOptimizedCpGs,   
                                returnAll = FALSE)  
                                
head(countsEPIC$counts)  
}

library (FlowSorted.Blood.EPIC)
#> Loading required package: minfi
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:base':
#> 
#>     expand.grid
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
#> Loading required package: Biostrings
#> Loading required package: XVector
#> 
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#> 
#>     strsplit
#> Loading required package: bumphunter
#> Loading required package: foreach
#> Loading required package: iterators
#> Loading required package: locfit
#> locfit 1.5-9.4    2020-03-24
#> Setting options('download.file.method.GEOquery'='auto')
#> Setting options('GEOquery.inmemory.gpl'=FALSE)
#> Loading required package: genefilter
#> 
#> Attaching package: 'genefilter'
#> The following objects are masked from 'package:MatrixGenerics':
#> 
#>     rowSds, rowVars
#> The following objects are masked from 'package:matrixStats':
#> 
#>     rowSds, rowVars
#> Loading required package: quadprog
#> Loading required package: nlme
#> 
#> Attaching package: 'nlme'
#> The following object is masked from 'package:Biostrings':
#> 
#>     collapse
#> The following object is masked from 'package:IRanges':
#> 
#>     collapse
#> Loading required package: IlluminaHumanMethylationEPICanno.ilm10b4.hg19
#> Loading required package: ExperimentHub
#> Loading required package: AnnotationHub
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr
#> 
#> Attaching package: 'AnnotationHub'
#> The following object is masked from 'package:Biobase':
#> 
#>     cache
library(FlowSorted.CordBloodCombined.450k)
#> Loading required package: IlluminaHumanMethylation450kanno.ilmn12.hg19
#> 
#> Attaching package: 'IlluminaHumanMethylation450kanno.ilmn12.hg19'
#> The following objects are masked from 'package:IlluminaHumanMethylationEPICanno.ilm10b4.hg19':
#> 
#>     Islands.UCSC, Locations, Manifest, Other, SNPs.132CommonSingle,
#>     SNPs.135CommonSingle, SNPs.137CommonSingle, SNPs.138CommonSingle,
#>     SNPs.141CommonSingle, SNPs.142CommonSingle, SNPs.144CommonSingle,
#>     SNPs.146CommonSingle, SNPs.147CommonSingle, SNPs.Illumina
#> snapshotDate(): 2020-10-02
data("IDOLOptimizedCpGsCordBlood")
# Step 1: Load the reference library to extract the umbilical cord samples
library(ExperimentHub)
hub <- ExperimentHub()
#> snapshotDate(): 2020-10-02
myfiles <- query(hub, "FlowSorted.CordBloodCombined.450k")
FlowSorted.CordBloodCombined.450k <- myfiles[[1]]
#> see ?FlowSorted.CordBloodCombined.450k and browseVignettes('FlowSorted.CordBloodCombined.450k') for documentation
#> loading from cache
FlowSorted.CordBloodCombined.450k
#> class: RGChannelSet 
#> dim: 575719 289 
#> metadata(0):
#> assays(2): Green Red
#> rownames(575719): 10600322 10600328 ... 74810485 74810492
#> rowData names(0):
#> colnames(289): 3999984027_R02C01 3999984027_R06C01 ...
#>   200705360098_R08C01 200705360110_R03C01
#> colData names(13): Sample_Name Subject.ID ... CellType_original
#>   Reclassified
#> Annotation
#>   array: IlluminaHumanMethylation450k
#>   annotation: ilmn12.hg19

# Step 2 separate the reference from the testing dataset if you want to run 
# examples for estimations for this function example

RGsetTargets <- FlowSorted.CordBloodCombined.450k[,
                FlowSorted.CordBloodCombined.450k$CellType == "WBC"]
sampleNames(RGsetTargets) <- paste(RGsetTargets$CellType,
                                seq_len(dim(RGsetTargets)[2]), sep = "_")
RGsetTargets
#> class: RGChannelSet 
#> dim: 575719 26 
#> metadata(0):
#> assays(2): Green Red
#> rownames(575719): 10600322 10600328 ... 74810485 74810492
#> rowData names(0):
#> colnames(26): WBC_1 WBC_2 ... WBC_25 WBC_26
#> colData names(13): Sample_Name Subject.ID ... CellType_original
#>   Reclassified
#> Annotation
#>   array: IlluminaHumanMethylation450k
#>   annotation: ilmn12.hg19

# Step 3: use your favorite package for deconvolution.
# Deconvolute a target data set consisting of 450K DNA methylation 
# data profiled in blood, using your prefered method.
# You can use our IDOL optimized DMR library for the Cord Blood,  This object
# contains a vector of length 517 consisting of the IDs of the IDOL optimized
# CpG probes.  These CpGs are used as the backbone for deconvolution and were
# selected because their methylation signature differs across the six normal 
# leukocyte subtypes plus the nucleated red blood cells.

data (IDOLOptimizedCpGsCordBlood)
head (IDOLOptimizedCpGsCordBlood)
#> [1] "cg12603453" "cg24765783" "cg06975018" "cg19708055" "cg11900509"
#> [6] "cg09581911"
# We recommend using Noob processMethod = "preprocessNoob" in minfi for the 
# target and reference datasets. 
# Cell types included are "CD8T", "CD4T", "NK", "Bcell", "Mono", "Gran", 
# "nRBC"
# To use the IDOL optimized list of CpGs (IDOLOptimizedCpGsCordBlood) use 
# estimateCellCounts2 from FlowSorted.Blood.EPIC. 
# Do not run with limited RAM the normalization step requires a big amount 
# of memory resources. Use the parameters as specified below for 
# reproducibility.

if (memory.limit()>8000){
    countsUCB<-estimateCellCounts2(RGsetTargets, 
                                    compositeCellType = 
                                                "CordBloodCombined", 
                                    processMethod = "preprocessNoob",
                                    probeSelect = "IDOL", 
                                    cellTypes = c("CD8T", "CD4T", "NK",  
                                    "Bcell", "Mono", "Gran", "nRBC"), 
                                    referencePlatform = 
                                        "IlluminaHumanMethylation450k",
                                    referenceset = 
                                        "FlowSorted.CordBloodCombined.450k",
                                    IDOLOptimizedCpGs =
                                        IDOLOptimizedCpGsCordBlood, 
                                    returnAll = FALSE)
    
    head(countsUCB$counts)
}
#> Warning: 'memory.limit()' is Windows-specific
#> [estimateCellCounts2] Combining user data with reference (flow sorted) data.
#> [estimateCellCounts2] Processing user and reference data together.
#> Loading required package: IlluminaHumanMethylation450kmanifest
#> [estimateCellCounts2] Using IDOL L-DMR probes for composition estimation.
#> [estimateCellCounts2] Estimating composition.
#>             CD8T       CD4T         NK      Bcell       Mono      Gran
#> WBC_1 0.05346047 0.19385977 0.04740254 0.05263905 0.09193623 0.4782162
#> WBC_2 0.02407560 0.07754949 0.01756794 0.02623153 0.10545983 0.6282946
#> WBC_3 0.00000000 0.12053599 0.09423365 0.00164263 0.08580405 0.6577766
#> WBC_4 0.03334680 0.16665169 0.10638825 0.05602585 0.05479126 0.4980870
#> WBC_5 0.05807522 0.16233837 0.06095911 0.02007502 0.07265118 0.5451731
#> WBC_6 0.04797527 0.11605401 0.05841137 0.01983578 0.10084251 0.6434691
#>             nRBC
#> WBC_1 0.07147758
#> WBC_2 0.11282631
#> WBC_3 0.02450290
#> WBC_4 0.04429688
#> WBC_5 0.07022071
#> WBC_6 0.02317434

LA Salas et al. (2018). An optimized library for reference-based deconvolution of whole-blood biospecimens assayed using the Illumina HumanMethylationEPIC BeadArray. Genome Biology 19, 64. doi: 10.1186/s13059-018-1448-7.

DC Koestler et al. (2016). Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL). BMC bioinformatics. 17, 120. doi: 10.1186/s12859-016-0943-7.

EA Houseman et al. (2012) DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics 13, 86. doi: 10.1186/1471-2105-13-86.

FlowSorted.Blood.EPIC

Lucas A Salas

2020-10-29