MapfxMPC(..., impute=TRUE)
- analysing data from MPC experimentsMapfxMPC(..., impute=FALSE)
- normalising data from MPC experimentsMapfxFFC
- normalising data from FFC experimentsMassively-Parallel Cytometry (MPC) experiments allow cost-effective quantification of more than 200 surface proteins at single-cell resolution. The Inflow protocol (Becht et al. 2021) is the pioneer of the pipeline for analysing MPC data, and the Bioconductor’s infinityFlow
package was developed for comprehensive analyses. However, the methods for background correction and removal of unwanted variation implemented in the package can be improved. We develop the MAPFX
package as an alternative that has a more thoughtful strategy to clean up the raw protein intensities. Unique features of our package compared to the infinityFlow
pipeline include performing background correction prior to imputation and removing unwanted variation from the data at the cell-level, while explicitly accounting for the potential association between biology and unwanted factors. We benchmarked our pipeline against the infinityFlow
pipeline and demonstrated that our approach is better at preserving biological signals, removing unwanted variation, and imputing unmeasured infinity markers (Liao et al. 2024). Two user friendly functions MapfxMPC
and MapfxFFC
are included in the MAPFX
package that were designed for data from either MPC or FFC experiments (see below sections for details).
The experimental and the computational pipeline of the Inflow protocol (Becht et al. 2021): (A) Experimental pipeline. The single-cell samples are stained with backbone markers (backbone panel staining), then the stained samples are allocated to wells with one particular infinity marker (infinity panel staining), lastly, data can be acquired from the flow cytometry assay for each well. (B) Computational pipeline. The matrix of the normalised data showing that the backbone matrix (gray) contains values for every single-cell (row), but only block diagonal entries of the infinity matrix (yellow) have measurements. Imputation of the unmeasured infinity markers is done by using the backbone markers as predictors in regression models. Finally, the completed data matrix is obtained after imputation. The above figure is extracted from Figure 1 of the paper by Liao et al. (2024).
This package implemented an end-to-end toolbox for analysing raw data from MPC experiments. More details on the methodology can be found in Liao et al. (2024). The MapfxMPC
function is designed for running through the whole pipeline. The pipeline starts by performing background correction on raw intensities to remove the noise from electronic baseline restoration and fluorescence compensation by adapting a normal-exponential convolution model. Unwanted technical variation, from sources such as well effects, is then removed using a log-normal model with plate, column, and row factors, after which infinity markers are imputed using the informative backbone markers as predictors with machine learning models. Cluster analysis and visualisation with UMAP two-dimensional representations can then be carried out if desired. Users can set MapfxMPC(..., impute=FALSE)
if the imputation is not needed.
For the protein intensities from FFC experiments, the function MapfxFFC
is used to carry out normalisation steps which include background correction and removal of unwanted variation, and the function can further perform cluster analysis and visualisation with UMAP two-dimensional representations if specified.
# FCSpath
└───FCSpath
│ └───fcs
│ │ Plate1_A01.fcs
│ │ Plate1_A02.fcs
│ │ ...
│ └───meta
│ │ filename_meta.csv
# Outpath
└───Outpath
│ └───intermediary
│ └───downstream
│ └───graph
## Note: the sub-folders `intermediary`, `downstream`, and `graph` will
## be generated automatically by MAPFX.
When set file_meta = "auto"
for MapfxMPC
, the file identifier keyword (GUID) of the FCS files MUST contain the following information and in the specified format:
Plate information: Plate1, Plate2, …, Plate9
Well information: A1, A2, …, A12, B1, …, H1, …, H12
When set file_meta = "usr"
, prepare filename_meta.csv
in the following format and save the CSV file under FCSpath/meta/
.
An example:
Filenam | Plate | Well | Column | Row | Well.lab |
---|---|---|---|---|---|
p1_a12.fcs | Plate1 | A12 | Col.12 | Row.01 | P1_A12 |
p2_d08.fcs | Plate2 | D08 | Col.08 | Row.04 | P2_D08 |
p3_g1.fcs | Plate3 | G01 | Col.01 | Row.07 | P3_G01 |
Note that the “Filenam” column refers to the GUID (file name) of each FCS file in the FCSpath/fcs/
.
Prepare filename_meta.csv
in the following format and save the CSV file in FCSpath/meta/
.
An example:
Filenam | Batch |
---|---|
090122.fcs | Batch1 |
070122.fcs | Batch2 |
010122.fcs | Batch3 |
The MAPFX package can be installed using the code below.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("MAPFX")
Along with the MAPFX package, we also load the following packages required for running functions in MAPFX.
library(MAPFX)
## specify the package names
suppressPackageStartupMessages({
library(flowCore)
library(Biobase)
library(stringr)
library(uwot)
library(iCellR)
library(igraph)
library(ggplot2)
library(RColorBrewer)
library(Rfast)
library(ComplexHeatmap)
library(circlize)
library(glmnetUtils)
library(e1071)
library(xgboost)
library(parallel)
library(pbapply)
library(reshape2)
library(gtools)
library(utils)
library(stats)
library(cowplot)
})
This dataset is a subset of the single-cell murine lung data at steady state downloaded from FlowRepository provided by Etienne Becht (Nov 2020). The raw protein intensities and the corresponding metadata were saved in the objects ord.fcs.raw.mt_mpc
and ord.fcs.raw.meta.df.out_mpc
which were generated from 266 .FCS files from 266 wells with 50 cells in each file.
This mice splenocytes dataset contains 50 cells (sorted CD4+ and CD8+ T cells) in each .FCS files which was down-sampled from the data provided by Jalal Alshaweesh (Oct 2023) on FlowRepository. The raw protein intensities and the corresponding metadata were saved in the objects ord.fcs.raw.mt_ffc
and ord.fcs.raw.meta.df.out_ffc
.
MapfxMPC(..., impute=TRUE)
- analysing data from MPC experimentsFor users who would like to perform all of the following steps: background correction, removal of unwanted variation (well effects), imputation, and cluster analysis.
# import built-in data
data(ord.fcs.raw.meta.df.out_mpc)
data(ord.fcs.raw.mt_mpc)
# create an Output directory in the current working directory for the argument 'Outpath' of the MapfxMPC function
dir.create(file.path(tempdir(), "MPC_impu_Output"))
# usage
# when impute = TRUE, randomly selecting 50% of the cells in each well for model training
set.seed(123)
MapfxMPC_impu_obj <- MapfxMPC(
runVignette = TRUE, #set FALSE if not running this Vignette
runVignette_meta = ord.fcs.raw.meta.df.out_mpc, #set NULL if not running this Vignette
runVignette_rawInten = ord.fcs.raw.mt_mpc, #set NULL if not running this Vignette
FCSpath = NULL, # users specify their own input path
Outpath = file.path(tempdir(), "MPC_impu_Output"), # or users specify their own output path
file_meta = "auto",
bkb.v = c(
"FSC-H", "FSC-W", "SSC-H", "SSC-W", "CD69-CD301b", "MHCII",
"CD4", "CD44", "CD8", "CD11c", "CD11b", "F480",
"Ly6C", "Lineage", "CD45a488", "CD24", "CD103"),
yvar = "Legend",
control.wells = c(
"P1_A01", "P2_A01", "P3_A01",
"P3_F04", "P3_F05", "P3_F06", "P3_F07", "P3_F08",
"P3_F09", "P3_F10", "P3_F11", "P3_F12",
"P3_G01", "P3_G02"),
bkb.upper.quantile = 0.9,
bkb.lower.quantile = 0.1,
bkb.min.quantile = 0.01,
inf.lower.quantile = 0.1,
inf.min.quantile = 0.01,
plots.bkc.bkb = TRUE, plots.bkc.inf = TRUE,
plots.initM = TRUE,
plots.rmWellEffect = TRUE,
impute = TRUE,
models.use = c("XGBoost"),
extra_args_regression_params = list(list(nrounds = 1500, eta = 0.03)),
prediction_events_downsampling = NULL,
impu.training = FALSE,
plots.imputation = TRUE,
cluster.analysis.bkb = TRUE, plots.cluster.analysis.bkb = TRUE,
cluster.analysis.all = TRUE, plots.cluster.analysis.all = TRUE,
cores = 4L)
##
##
##
## Creating directories for output...
##
##
##
## Background correcting backbone markers...
## Estimating parameters for calibration...
## backbone: 1
## backbone: 2
## backbone: 3
## backbone: 4
## backbone: 5
## backbone: 6
## backbone: 7
## backbone: 8
## backbone: 9
## backbone: 10
## backbone: 11
## backbone: 12
## backbone: 13
## backbone: 14
## backbone: 15
## backbone: 16
## backbone: 17
## Estimation of parameters... Completed!
## Calibrating backbone markers (except for physical measurements)...
## Calibration of backbone markers... Completed!
##
##
##
## Background correcting infinity markers...
## Estimating parameters for calibration AND calibrating infinity markers...
## Could not find enough cells (>=10) when used "mle.mean+3*mle.sd", so estimated alpha with the top 10 cells with "the largest values":
## 25 wells applied this strategy
## See Wellname_largest10.csv in the intermediary directory for details.
## Calibration of infinity markers... Completed!
##
##
##
## Forming a matrix of biology (M) for removal of well effect...
## Forming logicle functions...
## Logicle transforming raw intensity...
## Centring logicle transformed intensities...
## Centred logicle backbone data... Obtained!
## Deriving initial clusters with PhenoGraph (forming the M matrix)...
## Run Rphenograph starts:
## -Input data of 13300 rows and 17 columns
## -k is set to 50
## Finding nearest neighbors...
## DONE ~4.701s
## Compute jaccard coefficient between nearest-neighbor sets...
## DONE ~12.062s
## Build undirected graph from the weighted links...
## DONE ~3.087s
## Run louvain clustering on the graph ...
## DONE ~2.402s
## Run Rphenograph DONE, totally takes 22.252s.
## Return a community class
## -Modularity value:0.877501268487132
## -Number of clusters:17
## 24.050155878067
## UMAP with backbones (MPC)/proteins (FFC)...
## 57.8631880283356
## Visualising clusters...
## Completed!
##
##
##
## Removal of well effect...
## Estimating coefficients for removing well effect (Rfast - pre.adj)...
## Processing backbone: 1
## Processing backbone: 2
## Processing backbone: 3
## Processing backbone: 4
## Processing backbone: 5
## Processing backbone: 6
## Processing backbone: 7
## Processing backbone: 8
## Processing backbone: 9
## Processing backbone: 10
## Processing backbone: 11
## Processing backbone: 12
## Processing backbone: 13
## Processing backbone: 14
## Processing backbone: 15
## Processing backbone: 16
## Processing backbone: 17
## Estimation completed!
## Removing well effect for backbone markers...
## Adjustment completed!
## Examining the existence of well effect in the adjusted data (Rfast - post.adj)...
## Processing backbone: 1
## Processing backbone: 2
## Processing backbone: 3
## Processing backbone: 4
## Processing backbone: 5
## Processing backbone: 6
## Processing backbone: 7
## Processing backbone: 8
## Processing backbone: 9
## Processing backbone: 10
## Processing backbone: 11
## Processing backbone: 12
## Processing backbone: 13
## Processing backbone: 14
## Processing backbone: 15
## Processing backbone: 16
## Processing backbone: 17
##
##
##
## Imputation got started...
## Fitting regression models...
## Randomly selecting 50% of the cells in each well for model training...
## Fitting...
## XGBoost
## 31.9619183540344 seconds
## Imputing infinity (unmeasured well-specific) markers...
## Randomly drawing events to predict from the test set (if it's been asked)
## Imputing...
## XGBoost
## 11.8618659973145 seconds
## Concatenating predictions...
## Writing to disk...
## Visualising the accuracy of the predictions... (using testing set)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
##
##
##
## Cluster analysis with adjusted backbone markers and completed dataset for cells in the testing set...
##
## Clustering with normalised backbones
##
## Running UMAP...
##
## Running Phenograph...
##
## Run Rphenograph starts:
## -Input data of 6650 rows and 17 columns
## -k is set to 50
##
## Finding nearest neighbors...
##
## DONE ~1.19499999999999s
## Compute jaccard coefficient between nearest-neighbor sets...
##
## DONE ~6.25s
## Build undirected graph from the weighted links...
##
## DONE ~1.48500000000001s
## Run louvain clustering on the graph ...
##
## DONE ~0.785000000000025s
##
##
## Run Rphenograph DONE, totally takes 9.71500000000003s.
##
## Return a community class
## -Modularity value:0.888907118397234
##
##
## -Number of clusters:18
##
## Clustering with normalised backbones + imputed infinity markers (XGBoost)
##
## Running UMAP...
##
## Running Phenograph...
##
## Run Rphenograph starts:
## -Input data of 6650 rows and 269 columns
## -k is set to 50
##
## Finding nearest neighbors...
##
## DONE ~7.43599999999998s
## Compute jaccard coefficient between nearest-neighbor sets...
##
## DONE ~6.37799999999999s
## Build undirected graph from the weighted links...
##
## DONE ~1.49200000000002s
## Run louvain clustering on the graph ...
##
## DONE ~0.680000000000007s
##
##
## Run Rphenograph DONE, totally takes 15.986s.
##
## Return a community class
## -Modularity value:0.920143588207911
##
##
## -Number of clusters:26
##
## Visualising clusters...
##
## Completed!
##
##
## Cell group labels are saved in "GP.denoised.bkb" and "GP.denoised.bkb.impuInf*" columns...
##
##
##
##
## Cluster analysis with adjusted backbone markers for ALL cells...
##
## Cluster analysis for normalised backbone measurements...
##
## Clustering with normalised backbones
##
## Running UMAP...
##
## Running Phenograph...
##
## Run Rphenograph starts:
## -Input data of 13300 rows and 17 columns
## -k is set to 50
##
## Finding nearest neighbors...
##
## DONE ~3.42700000000002s
## Compute jaccard coefficient between nearest-neighbor sets...
##
## DONE ~12.137s
## Build undirected graph from the weighted links...
##
## DONE ~3.05699999999996s
## Run louvain clustering on the graph ...
##
## DONE ~2.50099999999998s
##
##
## Run Rphenograph DONE, totally takes 21.122s.
##
## Return a community class
## -Modularity value:0.897663361184424
##
##
## -Number of clusters:19
##
## Visualising clusters...
##
## Completed!
##
##
## Cell group labels are saved in "GP.denoised.bkb.allCells" column...
##
## Completed!
# check the details
help(MapfxMPC, package = "MAPFX")
All the output will be stored in file.path(tempdir(), "MPC_impu_Output")
(users can specify their own path: /Outpath/
).
MapfxMPC(..., impute=FALSE)
- normalising data from MPC experimentsFor users who would like to perform the following steps: background correction, removal of unwanted variation (well effects), and cluster analysis using backbones only.
# import built-in data
data(ord.fcs.raw.meta.df.out_mpc)
data(ord.fcs.raw.mt_mpc)
# create an Output directory in the current working directory for the argument 'Outpath' of the MapfxMPC function
dir.create(file.path(tempdir(), "MPC_NOimpu_Output"))
# usage
MapfxMPC_NOimpu_obj <- MapfxMPC(
runVignette = TRUE, #set FALSE if not running this Vignette
runVignette_meta = ord.fcs.raw.meta.df.out_mpc, #set NULL if not running this Vignette
runVignette_rawInten = ord.fcs.raw.mt_mpc, #set NULL if not running this Vignette
FCSpath = NULL, # users specify their own input path
Outpath = file.path(tempdir(), "MPC_NOimpu_Output"), # or users specify their own output path
file_meta="auto",
bkb.v = c(
"FSC-H", "FSC-W", "SSC-H", "SSC-W", "CD69-CD301b", "MHCII",
"CD4", "CD44", "CD8", "CD11c", "CD11b", "F480",
"Ly6C", "Lineage", "CD45a488", "CD24", "CD103"),
yvar="Legend",
control.wells = c(
"P1_A01", "P2_A01", "P3_A01",
"P3_F04", "P3_F05", "P3_F06", "P3_F07", "P3_F08",
"P3_F09", "P3_F10", "P3_F11", "P3_F12",
"P3_G01", "P3_G02"),
bkb.upper.quantile = 0.9,
bkb.lower.quantile = 0.1,
bkb.min.quantile = 0.01,
inf.lower.quantile = 0.1,
inf.min.quantile = 0.01,
plots.bkc.bkb = TRUE, plots.bkc.inf = TRUE,
plots.initM = TRUE,
plots.rmWellEffect = TRUE,
impute = FALSE,
cluster.analysis.bkb = TRUE, plots.cluster.analysis.bkb = TRUE,
cores = 4L)
# check the details
help(MapfxMPC, package = "MAPFX")
All the output will be stored in file.path(tempdir(), "MPC_NOimpu_Output")
(users can specify their own path: /Outpath/
).
MapfxFFC
- normalising data from FFC experimentsFor users who would like to perform the following steps: background correction, removal of unwanted variation (batch effects), and cluster analysis.
# import built-in data
data(ord.fcs.raw.meta.df.out_ffc)
data(ord.fcs.raw.mt_ffc)
# create an Output directory in the current working directory for the argument 'Outpath' of the MapfxFFC function
dir.create(file.path(tempdir(), "FFCnorm_Output"))
MapfxFFC_obj <- MapfxFFC(
runVignette = TRUE, #set FALSE if not running this Vignette
runVignette_meta = ord.fcs.raw.meta.df.out_ffc, #set NULL if not running this Vignette
runVignette_rawInten = ord.fcs.raw.mt_ffc, #set NULL if not running this Vignette
FCSpath = NULL, # users specify their own input path
Outpath = file.path(tempdir(), "FFCnorm_Output"), # or users specify their own output path
protein.v = c("CD3","CD4","CD8","CD45"),
protein.upper.quantile = 0.9,
protein.lower.quantile = 0.1,
protein.min.quantile = 0.01,
plots.bkc.protein = TRUE,
plots.initM = TRUE,
plots.rmBatchEffect = TRUE,
cluster.analysis.protein = TRUE, plots.cluster.analysis.protein = TRUE)
##
##
##
## Creating directories for output...
##
##
##
## Background correcting proteins...
## Estimating parameters for calibration...
## backbone: 1
## backbone: 2
## backbone: 3
## backbone: 4
## Estimation of parameters... Completed!
## Calibrating backbone markers (except for physical measurements)...
## Calibration of backbone markers... Completed!
##
##
##
## Forming a matrix of biology (M) for removal of batch effect...
## Forming logicle functions...
## Logicle transforming raw intensity...
## Centring logicle transformed intensities...
## Centred logicle backbone data... Obtained!
## Deriving initial clusters with PhenoGraph (forming the M matrix)...
## Run Rphenograph starts:
## -Input data of 250 rows and 4 columns
## -k is set to 50
## Finding nearest neighbors...
## DONE ~0.0029999999999859s
## Compute jaccard coefficient between nearest-neighbor sets...
## DONE ~0.23599999999999s
## Build undirected graph from the weighted links...
## DONE ~0.0539999999999736s
## Run louvain clustering on the graph ...
## DONE ~0.01400000000001s
## Run Rphenograph DONE, totally takes 0.30699999999996s.
## Return a community class
## -Modularity value:0.509492508513125
## -Number of clusters:4
## 2.0750105381012
## UMAP with backbones (MPC)/proteins (FFC)...
## 2.6923360824585
## Visualising clusters...
## Completed!
##
##
##
## Removal of batch effect...
## Estimating coefficients for removing batch effect (Rfast - pre.adj)...
## Processing protein: 1
## Processing protein: 2
## Processing protein: 3
## Processing protein: 4
## Estimation completed!
## Removing batch effect for protein markers...
## Adjustment completed!
## Examining the existence of batch effect in the adjusted data (Rfast - post.adj)...
## Processing protein: 1
## Processing protein: 2
## Processing protein: 3
## Processing protein: 4
##
##
##
## Cluster analysis with adjusted protein markers for ALL cells...
## Cluster analysis for normalised backbone measurements...
## Clustering with normalised backbones
## Running UMAP...
## Running Phenograph...
## Run Rphenograph starts:
## -Input data of 250 rows and 4 columns
## -k is set to 50
## Finding nearest neighbors...
## DONE ~0.00200000000000955s
## Compute jaccard coefficient between nearest-neighbor sets...
## DONE ~0.23399999999998s
## Build undirected graph from the weighted links...
## DONE ~0.0529999999999973s
## Run louvain clustering on the graph ...
## DONE ~0.0150000000000432s
## Run Rphenograph DONE, totally takes 0.30400000000003s.
## Return a community class
## -Modularity value:0.511696350731553
## -Number of clusters:6
## Visualising clusters...
## Completed!
##
## Cell group labels are saved in "GP.denoised.bkb.allCells" column...
## Completed!
# check the details
help(MapfxFFC, package = "MAPFX")
All the output will be stored in file.path(tempdir(), "FFCnorm_Output")
(users can specify their own path: /Outpath/
).
Three folders will be automatically generated in the output folder.
1. intermediary
:
Intermediary results will be saved in the .rds
or .RData
formats and will be stored here.
2. downstream
:
Final results will be saved in the .rds
format and will be stored here. The results include normalised backbone measurements (on both linear and log scale: bkc.adj.bkb_linearScale_mt.rds
and bkc.adj.bkb_logScale_mt.rds
), the completed dataset with imputed infinity (exploratory, PE) markers (predictions.Rds
), UMAP coordinates derived from both normalised backbones (ClusterAnalysis_umap_#bkb.rds
) and the completed dataset (ClusterAnalysis_ImpuMtd_umap_#bkb.#impuPE.rds
), and metadata (fcs_metadata_df.rds
) for cells including cluster labels derived from both normalised backbones and the completed data matrix.
3. graph
:
Figures will be stored here, including scatter plots for comparing background corrected and raw protein intensities for each protein marker, heatmaps for presenting the biological and unwanted effects in the data before and after removal of unwanted variation with mapfx.norm, boxplots (for imputations from multiple models) and a boxplot and a histogram (for imputations from a single model) of R-sq values for visualising the accuracy of imputed infinity (exploratory, PE) markers, and UMAP plots for showing the cluster structure.
The MapfxData package (soon will be available) contains two example datasets that can be used for demonstration.
MPC dataset:
It is a subset of the single-cell murine lung data at steady state downloaded from FlowRepository (Becht et al. 2021). The raw data contains 266 .FCS files from 266 wells with 1000 cells in each file.
FFC dataset:
It contains 316,779 cells (sorted CD4+ and CD8+ T cells) from mice splenocytes that was downloaded from FlowRepository provided by Jalal Alshaweesh (Oct 2023).
This works on both MPC and FFC data.