| Version: | 1.1.1 |
| Title: | Adaptive Machine Learning-Powered, Context-Matching Tool for Single-Cell and Spatial Transcriptomics Annotation |
| Description: | Annotates single-cell and spatial-transcriptomic (ST) data using context-matching marker datasets. It creates a unified marker list (‘Markers_list') from multiple sources: built-in curated databases (’Cellmarker2', 'PanglaoDB', 'scIBD', 'TCellSI', 'PCTIT', 'PCTAM'), Seurat objects with cell labels, or user-provided Excel tables. SlimR first uses adaptive machine learning for parameter optimization, and then offers two automated annotation approaches: 'cluster-based' and 'per-cell'. Cluster-based annotation assigns one label per cluster, expression-based probability calculation, and AUC validation. Per-cell annotation assigns labels to individual cells using three scoring methods with adaptive thresholds and ratio-based confidence filtering, plus optional UMAP spatial smoothing, making it ideal for heterogeneous clusters and rare cell types. The package also supports semi-automated workflows with heatmaps, feature plots, and combined visualizations for manual annotation. For more details, see Kabacoff (2020, ISBN:9787115420572). |
| License: | MIT + file LICENSE |
| URL: | https://github.com/zhaoqing-wang/SlimR |
| BugReports: | https://github.com/zhaoqing-wang/SlimR/issues |
| Depends: | R (≥ 3.5) |
| Imports: | cowplot, dplyr, ggplot2, patchwork, pheatmap, readxl, scales, Seurat, tidyr, tools, tibble |
| Suggests: | crayon, RANN, testthat (≥ 3.0.0) |
| Encoding: | UTF-8 |
| LazyData: | true |
| RoxygenNote: | 7.3.3 |
| Date: | 2026-02-05 |
| NeedsCompilation: | no |
| Packaged: | 2026-02-05 15:49:32 UTC; Runaw |
| Author: | Zhaoqing Wang |
| Maintainer: | Zhaoqing Wang <zhaoqingwang@mail.sdu.edu.cn> |
| Repository: | CRAN |
| Date/Publication: | 2026-02-05 16:20:14 UTC |
Apply UMAP-based spatial smoothing to scores
Description
Apply UMAP-based spatial smoothing to scores
Usage
.apply_umap_smoothing(
seurat_obj,
score_matrix,
umap_reduction,
k_neighbors,
smoothing_weight,
chunk_size,
verbose
)
Compute AUCell-like rank-based scores
Description
Uses a ranking approach similar to AUCell: for each cell, genes are ranked by expression, and the score is based on where marker genes fall in that ranking. This method is robust to batch effects and technical variation.
Usage
.compute_aucell_scores(expr_matrix, marker_sets, top_percent = 0.05)
Details
Key improvement: Uses recovery curve area under curve (AUC) calculation rather than simple proportion, giving partial credit to markers ranked just outside the top threshold.
Compute weighted scores for per-cell annotation
Description
This function uses an improved weighting scheme that considers:
Expression level (log-normalized)
Detection rate (binary: above min_expression threshold)
Marker specificity (how unique is this marker to this cell type)
Expression variability (CV-based: more variable genes are more discriminative)
Usage
.compute_weighted_scores(expr_matrix, marker_sets, min_expression)
Cellmarker2 dataset
Description
A dataset containing marker genes for different cell types from Cellmarker2
Usage
Cellmarker2
Format
A data frame with 8 columns:
Details
This dataset is used to filter and create a standardized marker list. The dataset can be filtered based on species, tissue class, tissue type, cancer type, and cell type to generate a list of marker genes for specific cell types.
Source
http://117.50.127.228/CellMarker/
See Also
Other Section_0_Database:
Cellmarker2_raw,
Cellmarker2_table,
Markers_list_PCTAM,
Markers_list_PCTIT,
Markers_list_TCellSI,
Markers_list_scIBD,
PanglaoDB,
PanglaoDB_raw,
PanglaoDB_table
Cellmarker2 raw dataset
Description
A dataset containing marker genes for different cell types from Cellmarker2
Usage
Cellmarker2_raw
Format
A data frame with 20 columns contined in the Cellmarker2 database:
Details
This dataset is used to filter and create a standardized marker list. The dataset can be filtered based on species, tissue class, tissue type, cancer type, and cell type to generate a list of marker genes for specific cell types.
Source
http://117.50.127.228/CellMarker/
See Also
Other Section_0_Database:
Cellmarker2,
Cellmarker2_table,
Markers_list_PCTAM,
Markers_list_PCTIT,
Markers_list_TCellSI,
Markers_list_scIBD,
PanglaoDB,
PanglaoDB_raw,
PanglaoDB_table
Cellmarker2 table
Description
A dataset containing marker genes for different cell types from Cellmarker2
Usage
Cellmarker2_table
Format
A list contain different types like species, tissue_class, tissue_type, cancer_type, cell_type
Details
This list is used to choose filters for creation of standardized marker list.
Source
http://117.50.127.228/CellMarker/
See Also
Other Section_0_Database:
Cellmarker2,
Cellmarker2_raw,
Markers_list_PCTAM,
Markers_list_PCTIT,
Markers_list_TCellSI,
Markers_list_scIBD,
PanglaoDB,
PanglaoDB_raw,
PanglaoDB_table
Annotate Seurat Object with SlimR Cell Type Predictions
Description
This function assigns SlimR predicted cell types to a Seurat object based on cluster annotations, and stores the results in the meta.data slot.
Usage
Celltype_Annotation(
seurat_obj,
cluster_col,
SlimR_anno_result,
plot_UMAP = TRUE,
annotation_col = "Cell_type_SlimR"
)
Arguments
seurat_obj |
A Seurat object containing cluster information in meta.data. |
cluster_col |
Character string indicating the column name in meta.data that contains cluster IDs. |
SlimR_anno_result |
List generated by function Celltype_Calculate() which containing a data.frame in $Prediction_results with: 1.cluster_col (Cluster identifiers (should match cluster_col in meta.data)) 2.Predicted_cell_type (Predicted cell types for each cluster). |
plot_UMAP |
logical(1); if TRUE, plot the UMAP with cell type annotations. |
annotation_col |
The location to write in 'meta.data' that contains the predicted cell type. (default = "Cell_type_SlimR") |
Value
A Seurat object with updated meta.data containing the predicted cell types.
Note
If plot_UMAP = TRUE, this function will print a UMAP plot as a side effect.
See Also
Other Section_3_Automated_Annotation:
Celltype_Annotation_PerCell(),
Celltype_Calculate(),
Celltype_Calculate_PerCell(),
Celltype_Verification(),
Celltype_Verification_PerCell(),
Parameter_Calculate(),
percell_workflow
Examples
## Not run:
sce <- Celltype_Annotation(seurat_obj = sce,
cluster_col = "seurat_clusters",
SlimR_anno_result = SlimR_anno_result,
plot_UMAP = TRUE,
annotation_col = "Cell_type_SlimR"
)
## End(Not run)
Uses "marker_list" to generate combined plot for cell annotation
Description
Uses "marker_list" to generate combined plot for cell annotation
Usage
Celltype_Annotation_Combined(
seurat_obj,
gene_list,
species,
cluster_col = "seurat_clusters",
assay = "RNA",
save_path = NULL,
colour_low = "white",
colour_high = "navy"
)
Arguments
seurat_obj |
Enter the Seurat object with annotation columns such as "seurat_cluster" in meta.data to be annotated. |
gene_list |
A list of cells and corresponding gene controls, the name of the list is cell type, and the first column of the list corresponds to markers. Lists can be generated using functions such as "Markers_filter_Cellmarker2 ()", "Markers_filter_PanglaoDB ()", "read_excel_markers ()", "read_seurat_markers ()", etc. |
species |
This parameter selects the species "Human" or "Mouse" for standard gene format correction of markers entered by "Marker_list". |
cluster_col |
Enter annotation columns such as "seurat_cluster" in meta.data of the Seurat object to be annotated. Default parameters use "cluster_col = 'seurat_clusters'". |
assay |
Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = 'RNA'". |
save_path |
The output path of the cell annotation picture. Example parameters use "save_path = './SlimR/Celltype_annotation_Bar/'". |
colour_low |
Color for lowest expression level. (default = "white") |
colour_high |
Color for highest expression level. (default = "navy") |
Value
The cell annotation picture is saved in "save_path".
See Also
Other Section_4_Semi_Automated_Annotation:
Celltype_Annotation_Features(),
Celltype_Annotation_Heatmap()
Examples
## Not run:
Celltype_Annotation_Combined(seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
cluster_col = "seurat_clusters",
assay = "RNA",
save_path = file.path(tempdir(),"SlimR_Celltype_Annotation_Combined"),
colour_low = "white",
colour_high = "navy"
)
## End(Not run)
Annotate cell types using features plot with different marker databases
Description
This function dynamically selects the appropriate annotation method
based on the gene_list_type parameter. It supports marker databases from
Cellmarker2, PanglaoDB, Seurat (via FindAllMarkers), or Excel files.
Usage
Celltype_Annotation_Features(
seurat_obj,
gene_list,
gene_list_type = "Default",
species = NULL,
cluster_col = "seurat_clusters",
assay = "RNA",
save_path = NULL,
min_counts = 1,
metric_names = NULL,
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy",
...
)
Arguments
seurat_obj |
A valid Seurat object with cluster annotations in |
gene_list |
A list of data frames containing marker genes and metrics.
Format depends on |
gene_list_type |
Type of marker database to use. Be one of:
|
species |
Species of the dataset: |
cluster_col |
Column name in |
assay |
Assay layer in the Seurat object (default: |
save_path |
Directory to save output PNGs. Must be explicitly specified. |
min_counts |
Minimum number of counts for Cellmarker2 annotations (default: |
metric_names |
Optional. Change the row name for the input mertics, not recommended unless necessary. (NULL is used as default parameter; used in "Seurat"/"Excel"). |
colour_low |
Color for lowest expression level. (default = "white") |
colour_high |
Color for highest expression level. (default = "navy") |
colour_low_mertic |
Color for lowest mertic level. (default = "white") |
colour_high_mertic |
Color for highest mertic level. (default = "navy") |
... |
Additional parameters passed to the specific annotation function. |
Value
Saves cell type annotation PNGs in save_path. Returns invisibly.
See Also
Other Section_4_Semi_Automated_Annotation:
Celltype_Annotation_Combined(),
Celltype_Annotation_Heatmap()
Examples
## Not run:
# Example for Cellmarker2
Celltype_Annotation_Features(seurat_obj = sce,
gene_list = Markers_list_Cellmarker2,
species = "Human",
cluster_col = "seurat_clusters",
assay = "RNA",
save_path = file.path(tempdir(),"SlimR_Celltype_annotation_Cellmarker2"),
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy",
)
# Example for PanglaoDB
Celltype_Annotation_Features(seurat_obj = sce,
gene_list = Markers_list_panglaoDB,
species = "Human",
cluster_col = "seurat_clusters",
assay = "RNA",
save_path = file.path(tempdir(),"SlimR_Celltype_annotation_PanglaoDB")
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy",
)
# Example for Seurat marker list
Celltype_Annotation_Features(seurat_obj = sce,
gene_list = Markers_list_Seurat,
species = "Human",
cluster_col = "seurat_clusters",
assay = "RNA",
save_path = file.path(tempdir(),"SlimR_Celltype_annotation_Seurat")
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy",
)
# Example for Excel marker list
Celltype_Annotation_Features(seurat_obj = sce,
gene_list = Markers_list_Excel,
species = "Human",
cluster_col = "seurat_clusters",
assay = "RNA",
save_path = file.path(tempdir(),"SlimR_Celltype_annotation_Excel")
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy",
)
## End(Not run)
Uses "marker_list" to generate heatmap for cell annotation
Description
Uses "marker_list" to generate heatmap for cell annotation
Usage
Celltype_Annotation_Heatmap(
seurat_obj,
gene_list,
species,
cluster_col = "seurat_clusters",
assay = "RNA",
min_expression = 0.1,
specificity_weight = 3,
colour_low = "navy",
colour_high = "firebrick3"
)
Arguments
seurat_obj |
Enter the Seurat object with annotation columns such as "seurat_cluster" in meta.data to be annotated. |
gene_list |
A list of cells and corresponding gene controls, the name of the list is cell type, and the first column of the list corresponds to markers. Lists can be generated using functions such as "Markers_filter_Cellmarker2 ()", "Markers_filter_PanglaoDB ()", "read_excel_markers ()", "read_seurat_markers ()", etc. |
species |
This parameter selects the species "Human" or "Mouse" for standard gene format correction of markers entered by "Marker_list". |
cluster_col |
Enter annotation columns such as "seurat_cluster" in meta.data of the Seurat object to be annotated. Default parameters use "cluster_col = 'seurat_clusters'". |
assay |
Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = 'RNA'". |
min_expression |
The min_expression parameter defines a threshold value to determine whether a cell's expression of a feature is considered "expressed" or not. It is used to filter out low-expression cells that may contribute noise to the analysis. Default parameters use "min_expression = 0.1". |
specificity_weight |
The specificity_weight parameter controls how much the expression variability (standard deviation) of a feature within a cluster contributes to its "specificity score." It amplifies or suppresses the impact of variability in the final score calculation.Default parameters use "specificity_weight = 3". |
colour_low |
Color for lowest probability level in Heatmap visualization of probability matrix. (default = "navy") |
colour_high |
Color for highest probability level Heatmap visualization of probability matrix. (default = "firebrick3") |
Value
The heatmap of the comparison between "cluster_col" in the Seurat object and the given gene set "gene_list" needs to be annotated.
See Also
Other Section_4_Semi_Automated_Annotation:
Celltype_Annotation_Combined(),
Celltype_Annotation_Features()
Examples
## Not run:
Celltype_Annotation_Heatmap(seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
cluster_col = "seurat_clusters",
assay = "RNA",
min_expression = 0.1,
specificity_weight = 3,
colour_low = "navy",
colour_high = "firebrick3"
)
## End(Not run)
Annotate Seurat Object with Per-Cell SlimR Predictions
Description
This function assigns SlimR per-cell predicted cell types directly to individual cells in a Seurat object's meta.data slot.
Usage
Celltype_Annotation_PerCell(
seurat_obj,
SlimR_percell_result,
plot_UMAP = TRUE,
annotation_col = "Cell_type_PerCell_SlimR",
plot_confidence = FALSE
)
Arguments
seurat_obj |
A Seurat object. |
SlimR_percell_result |
List generated by Celltype_Calculate_PerCell() containing Cell_annotations data.frame with Cell_barcode and Predicted_cell_type columns. |
plot_UMAP |
Logical; if TRUE, plot the UMAP with cell type annotations. Default: TRUE. |
annotation_col |
Column name to write in meta.data. Default: "Cell_type_PerCell_SlimR". |
plot_confidence |
Logical; if TRUE, also plot a UMAP colored by confidence scores. Default: FALSE. |
Value
A Seurat object with updated meta.data containing:
annotation_col: Predicted cell type for each cell
paste0(annotation_col, "_score"): Max score for each cell
paste0(annotation_col, "_confidence"): Confidence score for each cell
Note
If plot_UMAP = TRUE, this function will print UMAP plot(s) as a side effect.
See Also
Other Section_3_Automated_Annotation:
Celltype_Annotation(),
Celltype_Calculate(),
Celltype_Calculate_PerCell(),
Celltype_Verification(),
Celltype_Verification_PerCell(),
Parameter_Calculate(),
percell_workflow
Examples
## Not run:
# Run per-cell annotation
result <- Celltype_Calculate_PerCell(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human"
)
# Annotate Seurat object
sce <- Celltype_Annotation_PerCell(
seurat_obj = sce,
SlimR_percell_result = result,
plot_UMAP = TRUE,
annotation_col = "Cell_type_PerCell_SlimR"
)
## End(Not run)
Uses "marker_list" to calculate probability, prediction results, AUC and generate heatmap for cell annotation
Description
Uses "marker_list" to calculate probability, prediction results, AUC and generate heatmap for cell annotation
Usage
Celltype_Calculate(
seurat_obj,
gene_list,
species,
cluster_col = "seurat_clusters",
assay = "RNA",
min_expression = 0.1,
specificity_weight = 3,
threshold = 0.6,
compute_AUC = TRUE,
plot_AUC = TRUE,
AUC_correction = FALSE,
colour_low = "navy",
colour_high = "firebrick3"
)
Arguments
seurat_obj |
Enter the Seurat object with annotation columns such as "seurat_cluster" in meta.data to be annotated. |
gene_list |
A list of cells and corresponding gene controls, the name of the list is cell type, and the first column of the list corresponds to markers. Lists can be generated using functions such as "Markers_filter_Cellmarker2 ()", "Markers_filter_PanglaoDB ()", "read_excel_markers ()", "read_seurat_markers ()", etc. |
species |
This parameter selects the species "Human" or "Mouse" for standard gene format correction of markers entered by "Marker_list". |
cluster_col |
Enter annotation columns such as "seurat_cluster" in meta.data of the Seurat object to be annotated. Default parameters use "cluster_col = 'seurat_clusters'". |
assay |
Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = 'RNA'". |
min_expression |
The min_expression parameter defines a threshold value to determine whether a cell's expression of a feature is considered "expressed" or not. It is used to filter out low-expression cells that may contribute noise to the analysis. Default parameters use "min_expression = 0.1". |
specificity_weight |
The specificity_weight parameter controls how much the expression variability (standard deviation) of a feature within a cluster contributes to its "specificity score." It amplifies or suppresses the impact of variability in the final score calculation.Default parameters use "specificity_weight = 3". |
threshold |
This parameter refers to the normalized similarity between the "alternative cell type" and the "predicted cell type" in the returned results. (the default parameter is 0.6) |
compute_AUC |
Logical indicating whether to calculate AUC values for predicted cell types. AUC measures how well the marker genes distinguish the cluster from others. When TRUE, adds an AUC column to the prediction results. (default: TRUE) |
plot_AUC |
The logic indicates whether to draw an AUC curve for the predicted cell type. When TRUE, add an AUC_plot to result. (default: TRUE) |
AUC_correction |
Logical value controlling AUC-based correction. (default = FALSE) When set to TRUE: 1.Computes AUC values for candidate cell types. (probability > threshold) 2.Selects the cell type with the highest AUC as the final predicted type. 3.Records the selected type's AUC value in the "AUC" column. |
colour_low |
Color for lowest probability level in Heatmap visualization of probability matrix. (default = "navy") |
colour_high |
Color for highest probability level Heatmap visualization of probability matrix. (default = "firebrick3") |
Value
A list containing:
Expression_list: List of expression matrices for each cell type
Proportion_list: List of proportion of expression for each cell type
Expression_scores_matrix: Matrix of expression scores
Probability_matrix: Matrix of normalized probabilities
Prediction_results: Data frame with cluster annotations including:
cluster_col: Cluster identifier
Predicted_cell_type: Primary predicted cell type
AUC: Area Under the Curve value (when compute_AUC = TRUE)
Alternative_cell_types: Semi-colon separated alternative cell types
Heatmap_plot: Heatmap visualization of probability matrix (pheatmap object). Can be displayed using
print()orplot()AUC_plot: AUC visualization of Predicted cell type (ggplot object)
AUC_list: The resulting list of AUC values calculated for genes in alternative cell types above the approximate threshold
See Also
Other Section_3_Automated_Annotation:
Celltype_Annotation(),
Celltype_Annotation_PerCell(),
Celltype_Calculate_PerCell(),
Celltype_Verification(),
Celltype_Verification_PerCell(),
Parameter_Calculate(),
percell_workflow
Examples
## Not run:
SlimR_anno_result <- Celltype_Calculate(seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
cluster_col = "seurat_clusters",
assay = "RNA",
min_expression = 0.1,
specificity_weight = 3,
threshold = 0.6,
compute_AUC = TRUE,
plot_AUC = TRUE,
AUC_correction = FALSE,
colour_low = "navy",
colour_high = "firebrick3"
)
## End(Not run)
Per-cell annotation using marker expression and optional UMAP spatial smoothing
Description
Unlike cluster-based annotation, this function assigns cell type labels to each individual cell based on marker gene expression profiles. Optionally uses UMAP coordinates to smooth predictions via k-nearest neighbor voting.
Usage
Celltype_Calculate_PerCell(
seurat_obj,
gene_list,
species,
assay = "RNA",
method = c("weighted", "mean", "AUCell"),
min_expression = 0.1,
use_umap_smoothing = FALSE,
umap_reduction = "umap",
k_neighbors = 15,
smoothing_weight = 0.3,
min_score = "auto",
min_confidence = 1.2,
return_scores = FALSE,
ncores = 1,
chunk_size = 5000,
verbose = TRUE
)
Arguments
seurat_obj |
Seurat object with normalized expression data. |
gene_list |
A standardized marker list (same format as Celltype_Calculate). |
species |
"Human" or "Mouse" for gene name formatting. |
assay |
Assay to use (default: "RNA"). |
method |
Scoring method: "AUCell" (rank-based), "mean" (average expression), or "weighted" (expression * detection weighted). Default: "weighted". |
min_expression |
Minimum expression threshold for detection. Default: 0.1. |
use_umap_smoothing |
Logical. If TRUE, apply k-NN smoothing using UMAP coordinates to improve annotation consistency. Default: FALSE. |
umap_reduction |
Name of UMAP reduction in Seurat object. Default: "umap". |
k_neighbors |
Number of neighbors for UMAP smoothing. Default: 15. |
smoothing_weight |
Weight for neighbor votes vs cell's own score (0-1). Higher values give more weight to neighbors. Default: 0.3. |
min_score |
Minimum score threshold to assign a cell type. Cells below this threshold are labeled "Unassigned". Default: "auto" which adaptively sets the threshold based on number of cell types (1.5 / n_celltypes). Set to a numeric value (e.g., 0.1) to use a fixed threshold. |
min_confidence |
Minimum confidence threshold. Cells with confidence below this value are labeled "Unassigned". Confidence is calculated as the ratio of max score to second-highest score. Default: 1.2 (max must be 20% higher than second). Set to 1.0 to disable confidence filtering. |
return_scores |
If TRUE, return full score matrix. Default: FALSE. |
ncores |
Number of cores for parallel processing. Default: 1. |
chunk_size |
Number of cells to process per chunk (memory optimization). Default: 5000. |
verbose |
Print progress messages. Default: TRUE. |
Details
Scoring Methods
"weighted" (recommended): Combines normalized expression with detection rate. For each cell and cell type: score = mean(expr_i * weight_i) where weight_i is derived from the marker's specificity across the dataset.
"mean": Simple average of normalized marker expression. Fast but less discriminative for overlapping marker sets.
"AUCell": Rank-based scoring similar to AUCell package. For each cell, genes are ranked by expression, and the score is the proportion of marker genes in the top X% of expressed genes. Robust to technical variation.
UMAP Smoothing
When use_umap_smoothing = TRUE, the function:
Computes initial per-cell scores
Finds k nearest neighbors in UMAP space for each cell
Smooths scores by weighted averaging with neighbors
Re-assigns cell types based on smoothed scores
This helps reduce noise and improve consistency of annotations within spatially coherent regions.
Value
A list containing:
Cell_annotations: Data frame with Cell_barcode, Predicted_cell_type, Max_score, Confidence
Cell_confidence: Numeric vector of confidence scores per cell
Summary: Summary table of cell type counts and percentages
Expression_list: List of mean expression matrices per cell type (for verification)
Proportion_list: List of detection proportion matrices per cell type
Prediction_results: Summary data frame with per-cell-type statistics
Probability_matrix: Full cell × cell_type probability matrix (normalized)
Raw_score_matrix: Full cell × cell_type raw score matrix (before normalization)
Parameters: List of parameters used including adaptive thresholds
Cell_scores: (if return_scores=TRUE) Same as Probability_matrix
See Also
Other Section_3_Automated_Annotation:
Celltype_Annotation(),
Celltype_Annotation_PerCell(),
Celltype_Calculate(),
Celltype_Verification(),
Celltype_Verification_PerCell(),
Parameter_Calculate(),
percell_workflow
Examples
## Not run:
# Basic per-cell annotation
result <- Celltype_Calculate_PerCell(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
method = "weighted"
)
# Add annotations to Seurat object
sce$Cell_type_PerCell <- result$Cell_annotations$Predicted_cell_type
# With UMAP smoothing for more consistent annotations
result_smooth <- Celltype_Calculate_PerCell(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
use_umap_smoothing = TRUE,
k_neighbors = 20,
smoothing_weight = 0.3
)
## End(Not run)
Perform cell type verification and generate the validation dotplot
Description
This function performs verification of predicted cell types by selecting high log2FC and high expression proportion genes and generates and generate the validation dotplot.
Usage
Celltype_Verification(
seurat_obj,
SlimR_anno_result,
assay = "RNA",
gene_number = 5,
colour_low = "white",
colour_high = "navy",
annotation_col = "Cell_type_SlimR"
)
Arguments
seurat_obj |
A Seurat object containing single-cell data. |
SlimR_anno_result |
A list containing SlimR annotation results with: Expression_list - List of expression matrices for each cell type. Prediction_results - Data frame with cluster annotations. |
assay |
Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = 'RNA'". |
gene_number |
Integer specifying number of top genes to select per cell type. |
colour_low |
Color for lowest expression level. (default = "white") |
colour_high |
Color for highest expression level. (default = "navy") |
annotation_col |
Character string specifying the column in meta.data to use for grouping. |
Value
A ggplot object showing expression of top variable genes.
See Also
Other Section_3_Automated_Annotation:
Celltype_Annotation(),
Celltype_Annotation_PerCell(),
Celltype_Calculate(),
Celltype_Calculate_PerCell(),
Celltype_Verification_PerCell(),
Parameter_Calculate(),
percell_workflow
Examples
## Not run:
Celltype_Verification(seurat_obj = sce,
SlimR_anno_result = SlimR_anno_result,
assay = "RNA",
gene_number = 5,
colour_low = "white",
colour_high = "navy",
annotation_col = "Cell_type_SlimR"
)
## End(Not run)
Verify per-cell annotations with marker expression dotplot
Description
This function verifies per-cell SlimR annotations by generating a dotplot showing marker gene expression across predicted cell types.
Usage
Celltype_Verification_PerCell(
seurat_obj,
SlimR_percell_result,
assay = "RNA",
gene_number = 5,
colour_low = "white",
colour_high = "navy",
annotation_col = "Cell_type_PerCell_SlimR",
min_cells = 10
)
Arguments
seurat_obj |
A Seurat object with per-cell annotations. |
SlimR_percell_result |
A list from Celltype_Calculate_PerCell() containing Expression_list with marker genes per cell type. |
assay |
Assay to use. Default: "RNA". |
gene_number |
Number of top genes to show per cell type. Default: 5. |
colour_low |
Color for lowest expression. Default: "white". |
colour_high |
Color for highest expression. Default: "navy". |
annotation_col |
Column in meta.data with cell type annotations. Default: "Cell_type_PerCell_SlimR". |
min_cells |
Minimum number of cells required for a cell type to be included in the plot. Default: 10. |
Value
A ggplot object showing marker gene expression dotplot.
See Also
Other Section_3_Automated_Annotation:
Celltype_Annotation(),
Celltype_Annotation_PerCell(),
Celltype_Calculate(),
Celltype_Calculate_PerCell(),
Celltype_Verification(),
Parameter_Calculate(),
percell_workflow
Examples
## Not run:
# After running Celltype_Calculate_PerCell and Celltype_Annotation_PerCell
dotplot <- Celltype_Verification_PerCell(
seurat_obj = sce,
SlimR_percell_result = result,
gene_number = 5,
annotation_col = "Cell_type_PerCell_SlimR"
)
print(dotplot)
## End(Not run)
Uses "marker_list" from Cellmarker2 for cell annotation
Description
Uses "marker_list" from Cellmarker2 for cell annotation
Usage
Celltype_annotation_Cellmarker2(
seurat_obj,
gene_list,
species,
cluster_col = "seurat_clusters",
assay = "RNA",
save_path = NULL,
min_counts = 1,
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy"
)
Arguments
seurat_obj |
Enter the Seurat object with annotation columns such as "seurat_cluster" in meta.data to be annotated. |
gene_list |
Enter the standard "Marker_list" generated by the Cellmarker2 database for the SlimR package, generated by the "Markers_filter_Cellmarker2 ()" function. |
species |
This parameter selects the species "Human" or "Mouse" for standard gene format correction of markers entered by "Marker_list". |
cluster_col |
Enter annotation columns such as "seurat_cluster" in meta.data of the Seurat object to be annotated. Default parameters use "cluster_col = 'seurat_clusters'". |
assay |
Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = "RNA"". |
save_path |
The output path of the cell annotation picture. Example parameters use "save_path = './SlimR/Celltype_annotation_Cellmarker2/'". |
min_counts |
The minimum number of counts of genes in "Marker_list" entered. This number represents the number of the same gene in the same species and the same location in the Cellmarker2 database used for annotation of this cell type. Default parameters use "min_counts = 1". |
colour_low |
Color for lowest expression level. (default = "white") |
colour_high |
Color for highest expression level. (default = "navy") |
colour_low_mertic |
Color for lowest mertic level. (default = "white") |
colour_high_mertic |
Color for highest mertic level. (default = "navy") |
Value
The cell annotation picture is saved in "save_path".
See Also
Other Section_5_Other_Functions_Provided:
Celltype_annotation_Excel(),
Celltype_annotation_PanglaoDB(),
Celltype_annotation_Seurat()
Examples
## Not run:
Celltype_annotation_Cellmarker2(seurat_obj = sce,
gene_list = Markers_list_Cellmarker2,
species = "Human",
cluster_col = "seurat_clusters",
assay = "RNA",
save_path = file.path(tempdir(),"SlimR_Celltype_annotation_Cellmarker2")
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy",
)
## End(Not run)
Uses "marker_list" from Excel input for cell annotation
Description
Uses "marker_list" from Excel input for cell annotation
Usage
Celltype_annotation_Excel(
seurat_obj,
gene_list,
species,
cluster_col = "seurat_clusters",
assay = "RNA",
save_path = NULL,
metric_names = NULL,
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy"
)
Arguments
seurat_obj |
Enter the Seurat object with annotation columns such as "seurat_cluster" in meta.data to be annotated. |
gene_list |
Enter the standard "Marker_list" generated by the Excel files database for the SlimR package, generated by the "read_excel_markers()" function. |
species |
This parameter selects the species "Human" or "Mouse" for standard gene format correction of markers entered by "Marker_list". |
cluster_col |
Enter annotation columns such as "seurat_cluster" in meta.data of the Seurat object to be annotated. Default parameters use "cluster_col = "seurat_clusters"". |
assay |
Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = 'RNA'". |
save_path |
The output path of the cell annotation picture. Example parameters use "save_path = './SlimR/Celltype_annotation_Excel/'". |
metric_names |
Change the row name for the input mertics, not recommended unless necessary. (NULL is used as default parameter) |
colour_low |
Color for lowest expression level. (default = "white") |
colour_high |
Color for highest expression level. (default = "navy") |
colour_low_mertic |
Color for lowest mertic level. (default = "white") |
colour_high_mertic |
Color for highest mertic level. (default = "navy") |
Value
The cell annotation picture is saved in "save_path".
See Also
Other Section_5_Other_Functions_Provided:
Celltype_annotation_Cellmarker2(),
Celltype_annotation_PanglaoDB(),
Celltype_annotation_Seurat()
Examples
## Not run:
Celltype_annotation_Excel(seurat_obj = sce,
gene_list = Markers_list_Excel,
species = "Human",
cluster_col = "seurat_clusters",
assay = "RNA",
save_path = file.path(tempdir(),"SlimR_Celltype_annotation_Excel")
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy",
)
## End(Not run)
Uses "marker_list" from PanglaoDB for cell annotation
Description
Uses "marker_list" from PanglaoDB for cell annotation
Usage
Celltype_annotation_PanglaoDB(
seurat_obj,
gene_list,
species,
cluster_col = "seurat_clusters",
assay = "RNA",
save_path = NULL,
metric_names = NULL,
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy"
)
Arguments
seurat_obj |
Enter the Seurat object with annotation columns such as "seurat_cluster" in meta.data to be annotated. |
gene_list |
Enter the standard "Marker_list" generated by the PanglaoDB database for the SlimR package, generated by the "Markers_filter_PanglaoDB ()" function. |
species |
This parameter selects the species "Human" or "Mouse" for standard gene format correction of markers entered by "Marker_list". |
cluster_col |
Enter annotation columns such as "seurat_cluster" in meta.data of the Seurat object to be annotated. Default parameters use "cluster_col = 'seurat_clusters'". |
assay |
Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = 'RNA'". |
save_path |
The output path of the cell annotation picture. Example parameters use "save_path = './SlimR/Celltype_annotation_PanglaoDB/'". |
metric_names |
Warning: Do not enter information. This parameter is used to check if "Marker_list" conforms to the PanglaoDB database output. |
colour_low |
Color for lowest expression level. (default = "white") |
colour_high |
Color for highest expression level. (default = "navy") |
colour_low_mertic |
Color for lowest mertic level. (default = "white") |
colour_high_mertic |
Color for highest mertic level. (default = "navy") |
Value
The cell annotation picture is saved in "save_path".
See Also
Other Section_5_Other_Functions_Provided:
Celltype_annotation_Cellmarker2(),
Celltype_annotation_Excel(),
Celltype_annotation_Seurat()
Examples
## Not run:
Celltype_annotation_PanglaoDB(seurat_obj = sce,
gene_list = Markers_list_panglaoDB,
species = "Human",
cluster_col = "seurat_clusters",
assay = "RNA",
save_path = file.path(tempdir(),"SlimR_Celltype_annotation_PanglaoDB")
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy",
)
## End(Not run)
Uses "marker_list" from Seurat object for cell annotation
Description
Uses "marker_list" from Seurat object for cell annotation
Usage
Celltype_annotation_Seurat(
seurat_obj,
gene_list,
species,
cluster_col = "seurat_clusters",
assay = "RNA",
save_path = NULL,
metric_names = NULL,
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy"
)
Arguments
seurat_obj |
Enter the Seurat object with annotation columns such as "seurat_cluster" in meta.data to be annotated. |
gene_list |
Enter the standard "Marker_list" generated by the Seurat object database for the SlimR package, generated by the "read_seurat_markers()" function. |
species |
This parameter selects the species "Human" or "Mouse" for standard gene format correction of markers entered by "Marker_list". |
cluster_col |
Enter annotation columns such as "seurat_cluster" in meta.data of the Seurat object to be annotated. Default parameters use "cluster_col = 'seurat_clusters'". |
assay |
Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = 'RNA'". |
save_path |
The output path of the cell annotation picture. Example parameters use "save_path = './SlimR/Celltype_annotation_Seurat/'". |
metric_names |
Change the row name for the input mertics, not recommended unless necessary. (NULL is used as default parameter) |
colour_low |
Color for lowest expression level. (default = "white") |
colour_high |
Color for highest expression level. (default = "navy") |
colour_low_mertic |
Color for lowest mertic level. (default = "white") |
colour_high_mertic |
Color for highest mertic level. (default = "navy") |
Value
The cell annotation picture is saved in "save_path".
See Also
Other Section_5_Other_Functions_Provided:
Celltype_annotation_Cellmarker2(),
Celltype_annotation_Excel(),
Celltype_annotation_PanglaoDB()
Examples
## Not run:
Celltype_annotation_Seurat(seurat_obj = sce,
gene_list = Markers_list_Seurat,
species = "Human",
cluster_col = "seurat_clusters",
assay = "RNA",
save_path = file.path(tempdir(),"SlimR_Celltype_annotation_Seurat")
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy",
)
## End(Not run)
Create Marker_list from the Cellmarkers2 database
Description
Create Marker_list from the Cellmarkers2 database
Usage
Markers_filter_Cellmarker2(
df,
species = NULL,
tissue_class = NULL,
tissue_type = NULL,
cancer_type = NULL,
cell_type = NULL
)
Arguments
df |
Standardized Cellmarkers2 database. It is read as data(Cellmarkers2) in the SlimR library. |
species |
Species information in Cellmarkers2 database. The default input is "Human" or "Mouse".The input can be retrieved by "Cellmarkers2_table". For more information,please refer to http://117.50.127.228/CellMarker/ on Cellmarkers2's official website. |
tissue_class |
Tissue_class information in Cellmarkers2 database. The input can be retrieved by "Cellmarkers2_table". For more information, please refer to http://117.50.127.228/CellMarker/ on Cellmarkers2's official website. |
tissue_type |
Tissue_type information in Cellmarkers2 database. The input can be retrieved by "Cellmarkers2_table". For more information, please refer to http://117.50.127.228/CellMarker/ on Cellmarkers2's official website. |
cancer_type |
Cancer_type information in Cellmarkers2 database. The input can be retrieved by "Cellmarkers2_table". For more information, please refer to http://117.50.127.228/CellMarker/ on Cellmarkers2's official website. |
cell_type |
Cell_type information in Cellmarkers2 database. The input can be retrieved by "Cellmarkers2_table". For more information, please refer to http://117.50.127.228/CellMarker/ on Cellmarkers2's official website. |
Value
The standardized "Marker_list" in the SlimR package
See Also
Other Section_2_Standardized_Markers_List:
Markers_filter_PanglaoDB(),
Read_excel_markers(),
Read_seurat_markers()
Examples
Cellmarker2 <- SlimR::Cellmarker2
Markers_list_Cellmarker2 <- Markers_filter_Cellmarker2(
Cellmarker2,
species = "Human",
tissue_class = "Intestine",
tissue_type = NULL,
cancer_type = NULL,
cell_type = NULL
)
Create Marker_list from the PanglaoDB database
Description
Create Marker_list from the PanglaoDB database
Usage
Markers_filter_PanglaoDB(df, species_input, organ_input)
Arguments
df |
Standardized PanglaoDB database. It is read as data(PanglaoDB) in the SlimR library. |
species_input |
Species information in PanglaoDB database. The default input is "Human" or "Mouse".The input can be retrieved by "PanglaoDB_table". For more information,please refer to https://panglaodb.se/ on PanglaoDB's official website. |
organ_input |
Organ type information in the PanglaoDB database. The input can be retrieved by "PanglaoDB_table".For more information, please refer to https://panglaodb.se/ on PanglaoDB's official website. |
Value
The standardized "Marker_list" in the SlimR package
See Also
Other Section_2_Standardized_Markers_List:
Markers_filter_Cellmarker2(),
Read_excel_markers(),
Read_seurat_markers()
Examples
PanglaoDB <- SlimR::PanglaoDB
Markers_list_panglaoDB <- Markers_filter_PanglaoDB(
PanglaoDB,
species_input = 'Human',
organ_input = 'GI tract'
)
List of Macrophage subtype markers in the article "Macrophage diversity in cancer revisited in the era of single-cell omics"
Description
A dataset containing marker genes for different Macrophage subtypes from the article "Macrophage diversity in cancer revisited in the era of single-cell omics"
Usage
Markers_list_PCTAM
Format
A list with 7 tables.
Details
This list is a table of 7 types of Tumor-associated macrophages (TAMs) markers obtained from the article "Macrophage diversity in cancer revisited in the era of single-cell omics". The data source is "https://doi.org/10.1016/j.it.2022.04.008", and the reference literature is: Ruo-Yu Ma et al. (2022) doi:10.1016/j.it.2022.04.008.
Source
See Also
Other Section_0_Database:
Cellmarker2,
Cellmarker2_raw,
Cellmarker2_table,
Markers_list_PCTIT,
Markers_list_TCellSI,
Markers_list_scIBD,
PanglaoDB,
PanglaoDB_raw,
PanglaoDB_table
List of T cell subtype markers in the article "Pan-cancer single cell landscape of tumor-infiltrating T cells"
Description
A dataset containing marker genes for different T cell types from the article "Pan-cancer single cell landscape of tumor-infiltrating T cells"
Usage
Markers_list_PCTIT
Format
A list with 40 tables.
Details
This list is a table of 40 types of pan-cancer tumor-infiltrating T cell (PCTIT) markers obtained from the article "Pan-cancer single cell landscapeof tumor-infiltrating T cells". The data source is "https://doi.org/10.1126/science.abe6474", and the reference literature is: L. Zheng et al. (2021) doi:10.1126/science.abe6474.
Source
See Also
Other Section_0_Database:
Cellmarker2,
Cellmarker2_raw,
Cellmarker2_table,
Markers_list_PCTAM,
Markers_list_TCellSI,
Markers_list_scIBD,
PanglaoDB,
PanglaoDB_raw,
PanglaoDB_table
List of T cell subtype markers in the article TCellSI
Description
A dataset containing marker genes for different T cell subtypes from TCellSI
Usage
Markers_list_TCellSI
Format
A list with ten tables.
Details
This list is a table of 10 types of T cell markers obtained from TCellSI. The data source is "https://github.com/GuoBioinfoLab/TCellSI/blob/main/data/markers.rda", and the reference literature is: Yang et al. (2024) doi:10.1002/imt2.231.
Source
https://github.com/GuoBioinfoLab/TCellSI/
See Also
Other Section_0_Database:
Cellmarker2,
Cellmarker2_raw,
Cellmarker2_table,
Markers_list_PCTAM,
Markers_list_PCTIT,
Markers_list_scIBD,
PanglaoDB,
PanglaoDB_raw,
PanglaoDB_table
List of cell type markers in the article scIBD
Description
A dataset containing marker genes for different human intestine cell types from scIBD
Usage
Markers_list_scIBD
Format
A list with one hundred and one tables.
Details
This list is a table of 101 types of human intestine cell types markers obtained from scIBD. The article doi source is "https://doi.org/10.1038/s43588-023-00464-9", and the reference literature is: Nie et al. (2023) doi:10.1038/s43588-023-00464-9. Note: The 'Markers_list_scIBD' was generated using section 2.5.2 and the parameters 'sort_by = "logFC"' and 'gene_filter = 20' were set.
Source
doi:10.1038/s43588-023-00464-9
See Also
Other Section_0_Database:
Cellmarker2,
Cellmarker2_raw,
Cellmarker2_table,
Markers_list_PCTAM,
Markers_list_PCTIT,
Markers_list_TCellSI,
PanglaoDB,
PanglaoDB_raw,
PanglaoDB_table
PanglaoDB dataset
Description
A dataset containing marker genes for different cell types from PanglaoDB
Usage
PanglaoDB
Format
A data frame with 9 columns:
Details
This dataset is used to filter and create a standardized marker list.'
Source
See Also
Other Section_0_Database:
Cellmarker2,
Cellmarker2_raw,
Cellmarker2_table,
Markers_list_PCTAM,
Markers_list_PCTIT,
Markers_list_TCellSI,
Markers_list_scIBD,
PanglaoDB_raw,
PanglaoDB_table
PanglaoDB raw dataset
Description
A dataset containing marker genes for different cell types from PanglaoDB
Usage
PanglaoDB_raw
Format
A data frame with 14 columns contined in the PanglaoDB database:
Details
This dataset is used to filter and create a standardized marker list.'
Source
See Also
Other Section_0_Database:
Cellmarker2,
Cellmarker2_raw,
Cellmarker2_table,
Markers_list_PCTAM,
Markers_list_PCTIT,
Markers_list_TCellSI,
Markers_list_scIBD,
PanglaoDB,
PanglaoDB_table
PanglaoDB table
Description
A dataset containing marker genes for different cell types from PanglaoDB
Usage
PanglaoDB_table
Format
A list contain different types like species, organ, cell type.
Details
This list is used to choose filters for creation of standardized marker list.
Source
See Also
Other Section_0_Database:
Cellmarker2,
Cellmarker2_raw,
Cellmarker2_table,
Markers_list_PCTAM,
Markers_list_PCTIT,
Markers_list_TCellSI,
Markers_list_scIBD,
PanglaoDB,
PanglaoDB_raw
Adaptive Parameter Tuning for Single-Cell Data Annotation in SlimR
Description
This function automatically determines optimal min_expression, specificity_weight, and threshold parameters for single-cell data analysis based on dataset characteristics using adaptive algorithms derived from empirical analysis of single-cell datasets.
Usage
Parameter_Calculate(
seurat_obj,
features = NULL,
assay = NULL,
cluster_col = NULL,
n_celltypes = 50,
verbose = TRUE
)
Arguments
seurat_obj |
A Seurat object containing single-cell data |
features |
Character vector of feature names (genes) to analyze. If NULL, will use highly variable features from the Seurat object. |
assay |
Name of assay to use (default: default assay) |
cluster_col |
Column name in metadata containing cluster information |
n_celltypes |
Expected number of cell types in marker database (default: 50). Used for threshold recommendation calculation. |
verbose |
Whether to print progress messages (default: TRUE) |
Value
A list containing:
min_expression: Recommended expression threshold
specificity_weight: Recommended specificity weight
threshold: Recommended probability threshold for candidate selection
dataset_features: Extracted dataset characteristics
parameter_rationale: Explanation of parameter choices
See Also
Other Section_3_Automated_Annotation:
Celltype_Annotation(),
Celltype_Annotation_PerCell(),
Celltype_Calculate(),
Celltype_Calculate_PerCell(),
Celltype_Verification(),
Celltype_Verification_PerCell(),
percell_workflow
Examples
## Not run:
SlimR_params <- Parameter_Calculate(
seurat_obj = sce,
features = c("CD3E", "CD4", "CD8A"),
assay = "RNA",
cluster_col = "seurat_clusters",
n_celltypes = 98,
verbose = TRUE
)
## End(Not run)
Create "Marker_list" from Excel files ".xlsx"
Description
Create "Marker_list" from Excel files ".xlsx"
Usage
Read_excel_markers(path, has_colnames = TRUE)
Arguments
path |
The path information of Marker files stored in ".xlsx" format. The Sheet name in the file is filled with cell type. The first line of each Sheet is the table head, the first column is filled with markers information, and the following column is filled with mertic information. |
has_colnames |
Logical value indicating whether the first row contains column names. If FALSE, the first column will be named "Markers" and subsequent columns will be named "Col1", "Col2", etc. |
Value
The standardized "Marker_list" in the SlimR package.
See Also
Other Section_2_Standardized_Markers_List:
Markers_filter_Cellmarker2(),
Markers_filter_PanglaoDB(),
Read_seurat_markers()
Examples
## Not run:
Markers_list_Excel <- Read_excel_markers(
"D:/Laboratory/Marker_load.xlsx"
)
## End(Not run)
Create "Marker_list" from Seurat object
Description
Create "Marker_list" from Seurat object
Usage
Read_seurat_markers(
df,
sources = c("Seurat", "presto"),
sort_by = "FSS",
gene_filter = 20
)
Arguments
df |
Dataframe generated by "FindAllMarkers" function, recommend to use parameter "group.by = "Cell_type"" and "only.pos = TRUE". |
sources |
Type of markers sources to use. Be one of: |
sort_by |
Marker sorting parameter, for Seurat sources, select "avg_log2FC" or
"p_val_adj" or "FSS" (Feature Significance Score, FSS, product value of |
gene_filter |
The number of markers left for each cell type based on the "sort_by" parameter's level of difference. Default parameters use "gene_fliter = 20" |
Value
The standardized "Marker_list" in the SlimR package.
See Also
Other Section_2_Standardized_Markers_List:
Markers_filter_Cellmarker2(),
Markers_filter_PanglaoDB(),
Read_excel_markers()
Examples
## Not run:
# Example for Seurat sources markers
seurat_markers <- Seurat::FindAllMarkers(
object = sce,
group.by = "Cell_type",
only.pos = TRUE)
Markers_list_Seurat <- Read_seurat_markers(seurat_markers,
sources = "Seurat",
sort_by = "avg_log2FC",
gene_filter = 20
)
# Example for presto sources markers
seurat_markers <- dplyr::filter(
presto::wilcoxauc(
X = sce,
group_by = "Cell_type",
seurat_assay = "RNA"
),
padj < 0.05, logFC > 0.5
)
Markers_list_Seurat <- Read_seurat_markers(seurat_markers,
sources = "presto",
sort_by = "logFC",
gene_filter = 20
)
## End(Not run)
Calculate Cluster Variability (Use in package)
Description
Measures the degree of separation between different cell clusters based on expression patterns.
Usage
calculate_cluster_variability(data.features, features)
Arguments
data.features |
Data frame containing expression data and cluster labels |
features |
Feature names to include in analysis |
Value
Numeric value representing cluster separation strength
See Also
Other Section_1_Functions_Use_in_Package:
calculate_expression(),
calculate_expression_skewness(),
calculate_probability(),
compute_adaptive_parameters(),
estimate_batch_effect(),
extract_dataset_features()
Counts average expression of gene set (Use in package)
Description
Counts average expression of gene set (Use in package)
Usage
calculate_expression(
object,
features,
assay = NULL,
cluster_col = NULL,
colour_low = "white",
colour_high = "navy"
)
Arguments
object |
Enter a Seurat object. |
features |
Enter one or a set of markers. |
assay |
Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = NULL". |
cluster_col |
Enter the meta.data column in the Seurat object to be annotated, such as "seurat_cluster". Default parameters use "cluster_col = NULL". |
colour_low |
Color for lowest expression level. (default = "white") |
colour_high |
Color for highest expression level. (default = "black") |
Value
Average expression genes and relatied informations in the input "Seurat" object given "cluster_col" and given "features".
See Also
Other Section_1_Functions_Use_in_Package:
calculate_cluster_variability(),
calculate_expression_skewness(),
calculate_probability(),
compute_adaptive_parameters(),
estimate_batch_effect(),
extract_dataset_features()
Calculate Expression Distribution Skewness (Use in package)
Description
Computes the average skewness of gene expression distributions across all features.
Usage
calculate_expression_skewness(expression_matrix)
Arguments
expression_matrix |
Matrix of expression values |
Value
Mean absolute skewness across all genes
See Also
Other Section_1_Functions_Use_in_Package:
calculate_cluster_variability(),
calculate_expression(),
calculate_probability(),
compute_adaptive_parameters(),
estimate_batch_effect(),
extract_dataset_features()
Calculate gene set expression and infer probabilities with control datasets (Use in package)
Description
Calculate gene set expression and infer probabilities with control datasets (Use in package)
Usage
calculate_probability(
object,
features,
assay = NULL,
cluster_col = NULL,
min_expression = 0.1,
specificity_weight = 3
)
Arguments
object |
Enter a Seurat object. |
features |
Enter one or a set of markers. |
assay |
Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = NULL". |
cluster_col |
Enter the meta.data column in the Seurat object to be annotated, such as "seurat_cluster". Default parameters use "cluster_col = NULL". |
min_expression |
The min_expression parameter defines a threshold value to determine whether a cell's expression of a feature is considered "expressed" or not. It is used to filter out low-expression cells that may contribute noise to the analysis. Default parameters use "min_expression = 0.1". |
specificity_weight |
The specificity_weight parameter controls how much the expression variability (standard deviation) of a feature within a cluster contributes to its "specificity score." It amplifies or suppresses the impact of variability in the final score calculation.Default parameters use "specificity_weight = 3". |
Value
Average expression of genes in the input "Seurat" object given "cluster_col" and given "features".
See Also
Other Section_1_Functions_Use_in_Package:
calculate_cluster_variability(),
calculate_expression(),
calculate_expression_skewness(),
compute_adaptive_parameters(),
estimate_batch_effect(),
extract_dataset_features()
Compute Adaptive Parameters Based on Dataset Features (Use in package)
Description
Calculates optimal min_expression, specificity_weight, and threshold parameters using continuous adaptive algorithms based on dataset characteristics.
Usage
compute_adaptive_parameters(dataset_features, n_celltypes = 50)
Arguments
dataset_features |
List of dataset characteristics from extract_dataset_features() |
n_celltypes |
Expected number of cell types in marker database |
Value
List containing min_expression, specificity_weight, threshold, and rationale
See Also
Other Section_1_Functions_Use_in_Package:
calculate_cluster_variability(),
calculate_expression(),
calculate_expression_skewness(),
calculate_probability(),
estimate_batch_effect(),
extract_dataset_features()
Estimate Batch Effect Strength (Use in package)
Description
Roughly estimates the potential impact of batch effects using available metadata.
Usage
estimate_batch_effect(seurat_obj, assay)
Arguments
seurat_obj |
Seurat object |
assay |
Assay name |
Value
Batch effect score (0 indicates no detectable batch effect)
See Also
Other Section_1_Functions_Use_in_Package:
calculate_cluster_variability(),
calculate_expression(),
calculate_expression_skewness(),
calculate_probability(),
compute_adaptive_parameters(),
extract_dataset_features()
Extract Dataset Characteristics for Adaptive Parameter Calculation (Use in package)
Description
Computes various statistical features from single-cell data that are used as input for the parameter prediction model.
Usage
extract_dataset_features(
seurat_obj,
features,
assay = NULL,
cluster_col = NULL
)
Arguments
seurat_obj |
Seurat object |
features |
Features to analyze |
assay |
Assay name |
cluster_col |
Cluster column name |
Value
List of dataset characteristics including expression statistics, variability measures, and cluster properties
See Also
Other Section_1_Functions_Use_in_Package:
calculate_cluster_variability(),
calculate_expression(),
calculate_expression_skewness(),
calculate_probability(),
compute_adaptive_parameters(),
estimate_batch_effect()
Per-Cell Annotation Workflow Example
Description
Example workflow for using SlimR's per-cell annotation functions
Overview
The per-cell annotation workflow in SlimR provides an alternative to cluster-based annotation by scoring and labeling individual cells based on marker expression. This is useful when:
Clusters contain mixed cell types
You want finer-grained annotations
Cell states exist on a continuum
UMAP spatial context can improve annotation quality
Basic Workflow
# 1. Prepare your Seurat object (must have normalized data)
library(SlimR)
library(Seurat)
# 2. Create or load marker list
Markers_list <- Markers_filter_Cellmarker2(
Cellmarker2,
species = "Human",
tissue_class = "Intestine"
)
# 3. Run per-cell annotation
result <- Celltype_Calculate_PerCell(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
method = "weighted", # "weighted", "mean", or "AUCell"
min_expression = 0.1,
min_score = 0.1,
verbose = TRUE
)
# 4. Annotate Seurat object
sce <- Celltype_Annotation_PerCell(
seurat_obj = sce,
SlimR_percell_result = result,
plot_UMAP = TRUE,
plot_confidence = TRUE,
annotation_col = "Cell_type_PerCell"
)
# 5. Verify annotations
dotplot <- Celltype_Verification_PerCell(
seurat_obj = sce,
SlimR_percell_result = result,
gene_number = 5,
annotation_col = "Cell_type_PerCell"
)
print(dotplot)
Advanced
UMAP Spatial Smoothing:
# Use UMAP coordinates to smooth predictions via k-NN
# This reduces noise and improves consistency in spatial regions
result_smooth <- Celltype_Calculate_PerCell(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
use_umap_smoothing = TRUE,
k_neighbors = 20, # Number of neighbors to consider
smoothing_weight = 0.3, # 30
verbose = TRUE
)
# Compare smoothed vs unsmoothed
sce$Cell_type_Smooth <- result_smooth$Cell_annotations$Predicted_cell_type
sce$Cell_type_Raw <- result$Cell_annotations$Predicted_cell_type
DimPlot(sce, group.by = "Cell_type_Raw") |
DimPlot(sce, group.by = "Cell_type_Smooth")
Scoring Methods Comparison
# Method 1: Weighted (recommended for most cases)
# Combines expression with marker specificity and detection rate
result_weighted <- Celltype_Calculate_PerCell(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
method = "weighted"
)
# Method 2: Mean (simple, fast)
# Just averages normalized marker expression
result_mean <- Celltype_Calculate_PerCell(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
method = "mean"
)
# Method 3: AUCell (rank-based, robust to batch effects)
# Scores based on proportion of markers in top 5
result_aucell <- Celltype_Calculate_PerCell(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
method = "AUCell"
)
Comparing Cluster vs Per-Cell Annotation
# Cluster-based annotation (original SlimR approach)
cluster_result <- Celltype_Calculate(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
cluster_col = "seurat_clusters"
)
sce <- Celltype_Annotation(
seurat_obj = sce,
cluster_col = "seurat_clusters",
SlimR_anno_result = cluster_result,
annotation_col = "Cell_type_Cluster"
)
# Per-cell annotation
percell_result <- Celltype_Calculate_PerCell(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human"
)
sce <- Celltype_Annotation_PerCell(
seurat_obj = sce,
SlimR_percell_result = percell_result,
annotation_col = "Cell_type_PerCell"
)
# Compare
library(ggplot2)
library(patchwork)
p1 <- DimPlot(sce, group.by = "Cell_type_Cluster") +
ggtitle("Cluster-based")
p2 <- DimPlot(sce, group.by = "Cell_type_PerCell") +
ggtitle("Per-cell")
p1 | p2
# Check agreement
table(sce$Cell_type_Cluster, sce$Cell_type_PerCell)
Performance Optimization
# For large datasets, adjust chunk_size to manage memory
result <- Celltype_Calculate_PerCell(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
chunk_size = 10000, # Process 10k cells at a time
verbose = TRUE
)
# For UMAP smoothing, install RANN for 10-100x speedup
# install.packages("RANN")
result_smooth <- Celltype_Calculate_PerCell(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
use_umap_smoothing = TRUE,
k_neighbors = 15
# RANN will be used automatically if installed
)
Accessing Results
# Cell-level annotations
head(result$Cell_annotations)
# Cell_barcode Predicted_cell_type Max_score Confidence
# 1 AAACCTGAG... Enterocyte 0.85 0.62
# 2 AAACCTGCA... Goblet cell 0.72 0.45
# Summary statistics
result$Summary
# Cell_type Count Percentage
# 1 Enterocyte 5432 45.2
# 2 Goblet cell 2156 17.9
# Full probability matrix (if return_scores = TRUE)
result$Probability_matrix[1:5, 1:3]
# Enterocyte Goblet_cell Stem_cell
# AAACCTGAG... 0.85 0.10 0.05
# Extract high-confidence cells
high_conf <- result$Cell_annotations$Cell_barcode[
result$Cell_annotations$Confidence > 0.5
]
# Extract uncertain cells for manual review
uncertain <- result$Cell_annotations$Cell_barcode[
result$Cell_annotations$Confidence < 0.2
]
See Also
Other Section_3_Automated_Annotation:
Celltype_Annotation(),
Celltype_Annotation_PerCell(),
Celltype_Calculate(),
Celltype_Calculate_PerCell(),
Celltype_Verification(),
Celltype_Verification_PerCell(),
Parameter_Calculate()
Plot Method for pheatmap Objects
Description
This S3 method allows pheatmap objects (returned by Celltype_Calculate())
to be plotted using the generic plot() function. Without this method,
attempting to use plot() on a pheatmap object results in an error.
Usage
## S3 method for class 'pheatmap'
plot(x, ...)
Arguments
x |
A pheatmap object, typically from |
... |
Additional arguments (currently ignored) |
Details
Pheatmap objects contain a gtable component that needs to be drawn using
grid graphics. This method handles that automatically when plot() is called.
Alternative ways to display pheatmaps:
-
print(pheatmap_object)- Works natively -
plot(pheatmap_object)- Works after loading SlimR -
grid::grid.draw(pheatmap_object$gtable)- Direct access
Value
Invisibly returns the input pheatmap object after displaying it
Examples
## Not run:
# After running Celltype_Calculate()
cluster_results <- Celltype_Calculate(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human"
)
# Now both of these work:
print(cluster_results$Heatmap_plot)
plot(cluster_results$Heatmap_plot)
## End(Not run)