Main function
The main idea of IRIS-FGM consists of two major strategies:
- biclustering
- quick mode (cluster)
IRIS-FGM integrates in-house and state-of-the-art computational tools and provides two analysis strategies, including bicluster-based co-expression gene analysis (Xie, et al., 2020) and LTMG (left-truncated mixture Gaussian model)-embedded scRNA-Seq analysis (Wan, et al., 2019).
The main idea of IRIS-FGM consists of two major strategies:
The computational frame is constructed under the S4 data structure in R. The structure of BRIC object
is:
We recommend user to install IRIS-FGM on large memory (32GB) based linux operation system if user aims at analyzing bicluster-based co-expression analysis; if user aims at analyzing data by quick mode, we recommend to install IRIS-FGM on small memeory (8GB) based Windows or linux operation system; IRIS-FGM does not support MAC. We will assum you have the following installed:
Pre-install packge
if(!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
# install from bioconductor
BiocManager::install(c('org.Mm.eg.db','multtest', 'org.Hs.eg.db','clusterProfiler','DEsingle', 'DrImpute', 'scater', 'scran'))
# install from cran
chooseCRANmirror()
BiocManager::install(c('devtools', 'AdaptGauss', "pheatmap", 'mixtools','MCL', 'anocva', "Polychrome", 'qgraph','Rtools','ggpubr',"ggraph", "Seurat"))
When you perform co-expression analysis, it will output several intermediate files, thus please make sure that you have write permission to the folder where IRIS-FGM is located.
For installation, simply type the following command in your R console, please select option 3 when R asks user to update packages:
This tutorial run on a real dataset to illustrate the results obtained at each step.
As example, we will use Yan’s data, a dataset containing 90 cells and 20,214 genes from human embryo, to conduct cell type prediction.
Yan, L. et al. Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nat. Struct. Mol. Biol. 20, 1131-1139 (2013)
The original expression matrix was downloaded from https://s3.amazonaws.com/scrnaseq-public-datasets/manual-data/yan/nsmb.2660-S2.csv. The expression is provided as RPKM value. For convenience, we removed the space in the column names and deleted the second column(Transcript_ID). The processed data is available at https://bmbl.bmi.osumc.edu/downloadFiles/Yan_expression.txt.
IRIS-FGM can accepted 10X chromium input files, including a folder (contain gene name, cell name, and sparse matrix) and .h5 file.
# dir.create("your working directory",recursive = TRUE)
# setwd("your working directory")
library(IRISFGM)
##
##
## Registered S3 method overwritten by 'gamlss':
## method from
## print.ri bit
##
# if you will use the ".h5" as input file, please uncomment the following command.
# input_matrix <- ReadFrom10X_h5("dir_to_your_hdf5_file")
# if you will use the 10x folder as input file, please uncomment the following command.
# input_matrix <- ReadFrom10X_folder("dir_to_10x_folder")
First, we should download data from the link, then we will use this data set as example to run the pipeline.
set.seed(123)
seed_idx <- sample(1:nrow(InputMatrix),3000)
InputMatrix_sub <- InputMatrix[seed_idx,]
object <- CreateIRISFGMObject(InputMatrix_sub)
## Creating IRISCEM object.
## The original input file contains 90 cells and 3000 genes
## Removed 97 genes that total expression value is equal or less than 0
## Removed 0 cells that number of expressed gene is equal or less than 0
meta.info
is data frame of which row name should be cell ID, and column name should be cell type.my_meta <- read.table(url("https://bmbl.bmi.osumc.edu/downloadFiles/Yan_cell_label.txt"),header = TRUE,row.names = 1)
object <- AddMeta(object, meta.info = my_meta)
User can choose perform normalization or imputation based on their need. The normalization method has two options, one is the simplist CPM normalization (default normalization = 'cpm'
). The other is from package scran and can be opened by using parameter normalization = 'scran'
, . The imputation method is from package DrImpute and can be opened by using parameter IsImputation = TRUE
(default as closed).
The argument Gene_use = 500
is top 500 highlt variant genes which are selected to run LTMG. For quick mode, we recommend to use top 2000 gene (here we use top 500 gene for saving time). On the contrary, for co-expression gene analysis, we recommend to use all gene by changing Gene_use = "all"
.
# do not show progress bar
quiet <- function(x) {
sink(tempfile())
on.exit(sink())
invisible(force(x))
}
# demo only run top 500 gene for saving time.
object <- quiet(RunLTMG(object, Gene_use = "500"))
# you can get LTMG signal matrix
LTMG_Matrix <- GetLTMGmatrix(object)
LTMG_Matrix[1:5,1:5]
## OoCyte_1 OoCyte_2 OoCyte_3 Zygote_2 2_Cell_embryo_1_Cell_1
## MTRNR2L2 1 2 2 2 2
## TPT1 1 2 2 2 2
## UBB 2 2 2 2 2
## FABP5 2 2 2 2 2
## PTTG1 2 2 2 2 2
IRIS-FGM can provide biclustering function, which is based on our in-house novel algorithm, QUBIC2 (https://github.com/maqin2001/qubic2). Here we will show the basic biclustering usage of IRIS-FGM using a \(500\times 87\) expression matrix generated from previous top 500 variant genes. However, we recommend user should use “Gene_use = all”" to generate LTMG matrix.
User can type the following command to run discretization (LTMG) + biclustering directly:
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|======= | 9%
|
|======= | 10%
|
|======= | 11%
|
|======== | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 18%
|
|============= | 19%
|
|============== | 19%
|
|============== | 20%
|
|============== | 21%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 22%
|
|================ | 23%
|
|================= | 24%
|
|================= | 25%
|
|================== | 25%
|
|================== | 26%
|
|=================== | 27%
|
|=================== | 28%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 29%
|
|===================== | 30%
|
|===================== | 31%
|
|====================== | 31%
|
|====================== | 32%
|
|======================= | 32%
|
|======================= | 33%
|
|======================== | 34%
|
|======================== | 35%
|
|========================= | 35%
|
|========================= | 36%
|
|========================== | 37%
|
|========================== | 38%
|
|=========================== | 38%
|
|=========================== | 39%
|
|============================ | 39%
|
|============================ | 40%
|
|============================ | 41%
|
|============================= | 41%
|
|============================= | 42%
|
|============================== | 42%
|
|============================== | 43%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 47%
|
|================================= | 48%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 49%
|
|=================================== | 50%
|
|=================================== | 51%
|
|==================================== | 51%
|
|==================================== | 52%
|
|===================================== | 52%
|
|===================================== | 53%
|
|====================================== | 54%
|
|====================================== | 55%
|
|======================================= | 55%
|
|======================================= | 56%
|
|======================================== | 57%
|
|======================================== | 58%
|
|========================================= | 58%
|
|========================================= | 59%
|
|========================================== | 59%
|
|========================================== | 60%
|
|========================================== | 61%
|
|=========================================== | 61%
|
|=========================================== | 62%
|
|============================================ | 62%
|
|============================================ | 63%
|
|============================================= | 64%
|
|============================================= | 65%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 67%
|
|=============================================== | 68%
|
|================================================ | 68%
|
|================================================ | 69%
|
|================================================= | 69%
|
|================================================= | 70%
|
|================================================= | 71%
|
|================================================== | 71%
|
|================================================== | 72%
|
|=================================================== | 72%
|
|=================================================== | 73%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|===================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 77%
|
|====================================================== | 78%
|
|======================================================= | 78%
|
|======================================================= | 79%
|
|======================================================== | 79%
|
|======================================================== | 80%
|
|======================================================== | 81%
|
|========================================================= | 81%
|
|========================================================= | 82%
|
|========================================================== | 82%
|
|========================================================== | 83%
|
|=========================================================== | 84%
|
|=========================================================== | 85%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 87%
|
|============================================================= | 88%
|
|============================================================== | 88%
|
|============================================================== | 89%
|
|=============================================================== | 89%
|
|=============================================================== | 90%
|
|=============================================================== | 91%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 92%
|
|================================================================= | 93%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 97%
|
|==================================================================== | 98%
|
|===================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 99%
|
|======================================================================| 100%
# Please uncomment the following command and make sure to set a correct working directory so that the following command will generate intermeidate files.
# object <- RunBicluster(object, DiscretizationModel = "LTMG",OpenDual = FALSE,
# NumBlockOutput = 100, BlockOverlap = 0.5, BlockCellMin = 25)
(The default parameters in IRIS-FGM are BlockCellMin=15, BlockOverlap=0.7, Extension=0.90, NumBlockOutput=100 you may use other parameters as you like, just specify them in the argument)
User can use reduction = "umap"
or reductopm = "tsne"
to perform dimension reduction.
# demo only run top 500 gene for saving time.
object <- RunDimensionReduction(object, mat.source = "UMImatrix",reduction = "umap")
## Centering and scaling data matrix
## Warning in irlba(A = t(x = object), nv = npcs, ...): You're computing too large
## a percentage of total singular values, use a standard svd instead.
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 87
## Number of edges: 1450
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.6856
## Number of communities: 3
## Elapsed time: 0 seconds
The cell cluster prediction of IRIS-FGM is based on the biclustering results. In short, it will construct a weighted graph based on the biclusters and then do clustering on the weighted graph. To facilitate the process, we will use the pre-generated object to perform cell clustering result based on biclustering results.
# Please uncomment the following command and make sure your working directory is the same as the directory containing intermediate output files.
# object <- FindClassBasedOnMC(object)
## ncount_RNA nFeature Cluster Seurat_r_0.8_k_20 MC_Label
## OoCyte_1 28202.91 657 1 2 1
## OoCyte_2 27029.84 660 1 2 1
## OoCyte_3 26520.14 672 1 2 1
## Zygote_1 24796.04 681 2 2 1
## Zygote_2 23575.54 679 2 2 1