3. Download data from the EMTscoreData R package

You can obtain the datasets included in this package through the ExperimentHub framework. To start, use query() to inspect all records associated with the EMTscoreData package, which will show you the available resources along with their corresponding identifiers. After identifying the items of interest, you can download them directly from the hub.

library(EMTscoreData)
#library(EMTscore)
library(ExperimentHub)

## Loading required package: BiocGenerics

## Loading required package: generics

## 
## Attaching package: 'generics'

## The following objects are masked from 'package:base':
## 
##     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
##     setequal, union

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
##     as.data.frame, basename, cbind, colnames, dirname, do.call,
##     duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
##     mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
##     unsplit, which.max, which.min

## Loading required package: AnnotationHub

## Loading required package: BiocFileCache

## Loading required package: dbplyr

library(SummarizedExperiment)

## Loading required package: MatrixGenerics

## Loading required package: matrixStats

## 
## Attaching package: 'MatrixGenerics'

## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars

## Loading required package: GenomicRanges

## Loading required package: stats4

## Loading required package: S4Vectors

## 
## Attaching package: 'S4Vectors'

## The following object is masked from 'package:utils':
## 
##     findMatches

## The following objects are masked from 'package:base':
## 
##     I, expand.grid, unname

## Loading required package: IRanges

## Loading required package: Seqinfo

## Loading required package: Biobase

## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

## 
## Attaching package: 'Biobase'

## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians

## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians

## The following object is masked from 'package:ExperimentHub':
## 
##     cache

## The following object is masked from 'package:AnnotationHub':
## 
##     cache

library(Seurat)

## Loading required package: SeuratObject

## Loading required package: sp

## 
## Attaching package: 'sp'

## The following object is masked from 'package:IRanges':
## 
##     %over%

## 
## Attaching package: 'SeuratObject'

## The following object is masked from 'package:SummarizedExperiment':
## 
##     Assays

## The following object is masked from 'package:GenomicRanges':
## 
##     intersect

## The following object is masked from 'package:Seqinfo':
## 
##     intersect

## The following object is masked from 'package:IRanges':
## 
##     intersect

## The following object is masked from 'package:S4Vectors':
## 
##     intersect

## The following object is masked from 'package:BiocGenerics':
## 
##     intersect

## The following objects are masked from 'package:base':
## 
##     intersect, t

## 
## Attaching package: 'Seurat'

## The following object is masked from 'package:SummarizedExperiment':
## 
##     Assays

library(ggplot2)

3.1 Browse all EMTscoreData records

EMTscoreData registers multiple resources (datasets) on the hub; you can list them first, then download specific ones by their unique ExperimentHub IDs (e.g., EH10291).

query() returns a Hub object listing all records matching the keyword. This helps you discover what datasets are available and what their IDs are.

eh <- ExperimentHub()
query(eh , 'EMTscoreData')

## ExperimentHub with 12 records
## # snapshotDate(): 2026-01-30
## # $dataprovider: David P. Cook
## # $species: Homo sapiens
## # $rdataclass: SingleCellExperiment
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["EH10282"]]' 
## 
##             title          
##   EH10282 | MCF7_TNF.rda   
##   EH10283 | MCF7_EGF.rda   
##   EH10284 | MCF7_TGFB1.rda 
##   EH10285 | OVCA420_TNF.rda
##   EH10286 | OVCA420_EGF.rda
##   ...       ...            
##   EH10289 | DU145_EGF.rda  
##   EH10290 | DU145_TGFB1.rda
##   EH10291 | A549_TNF.rda   
##   EH10292 | A549_EGF.rda   
##   EH10293 | A549_TGFB1.rda

3.2 Download data using the unique identifier:

Below we download three example datasets. These objects are typically SingleCellExperiment, which is the recommended Bioconductor container for single-cell expression data.

A549_TNF <- eh[['EH10291']]

## see ?EMTscoreData and browseVignettes('EMTscoreData') for documentation

## loading from cache

## require("SingleCellExperiment")

A549_EGF <- eh[['EH10292']]

## see ?EMTscoreData and browseVignettes('EMTscoreData') for documentation

## downloading 1 resources

## retrieving 1 resource

## loading from cache

A549_TGFB1 <- eh[['EH10293']]

## see ?EMTscoreData and browseVignettes('EMTscoreData') for documentation

## downloading 1 resources

## retrieving 1 resource

## loading from cache

Sanity checks: show the class and a brief overview

class(A549_TNF)

## [1] "SingleCellExperiment"
## attr(,"package")
## [1] "SingleCellExperiment"

A549_TNF

## class: SingleCellExperiment 
## dim: 13143 12911 
## metadata(0):
## assays(2): counts logcounts
## rownames(13143): FO538757.2 AP006222.2 ... AC007325.4 AC240274.1
## rowData names(0):
## colnames(12911): Mix1_AAACCTGTCATGCTCC Mix1_AAACGGGAGTGTCCCG ...
##   Mix4b_TTTGCGCGTGTCTGAT Mix4b_TTTGGTTGTGGTCCGT
## colData names(22): orig.ident nCount_RNA ... Pseudotime ident
## reducedDimNames(3): PCA UMAP UMAP_PSEUDO
## mainExpName: RNA
## altExpNames(0):

3.3 What is inside a SingleCellExperiment?

A SingleCellExperiment generally contains: assays: expression matrices (e.g., counts, logcounts) rowData: feature-level annotation (genes) colData: cell-level metadata (e.g., pseudotime, condition, batch) (optional) reducedDims: PCA/UMAP embeddings, etc. Let’s inspect what assays and metadata columns exist in one dataset:

#List available assays (common ones: "counts", "logcounts")
assayNames(A549_TNF)

## [1] "counts"    "logcounts"

#List available per-cell metadata fields
colnames(colData(A549_TNF))

##  [1] "orig.ident"       "nCount_RNA"       "nFeature_RNA"     "percent.mito"    
##  [5] "Sample"           "CellLine"         "Treatment"        "Time"            
##  [9] "Doublet"          "S.Score"          "G2M.Score"        "Phase"           
## [13] "Mix"              "SCT_snn_res.0.8"  "SCT_snn_res.0.1"  "RNA_snn_res.0.1" 
## [17] "seurat_clusters"  "RNA_snn_res.0.05" "Cluster"          "RNA_snn_res.0.5" 
## [21] "Pseudotime"       "ident"

If your dataset includes a pseudotime field, it will typically appear in colData(sce)$Pseudotime (or a similarly named column). The plotting example later assumes a pseudotime column exists; if your column name differs, adjust the time_col argument accordingly.

4. Data analysis and visualization

Some single-cell datasets can be large. Running full analyses on all cells in a vignette may exceed RAM on a typical machine and may fail during package checks. Therefore, this vignette uses a small subset of cells from each dataset. This keeps computations fast and demonstrates the workflow without requiring high-memory hardware.

set.seed(1)

subset_cells <- function(sce, n = 500) {
n <- min(n, ncol(sce))
sce[, sample(seq_len(ncol(sce)), n)]
}

A549_TNF_small <- subset_cells(A549_TNF, n = 1500)
A549_EGF_small <- subset_cells(A549_EGF, n = 1500)
A549_TGFB1_small <- subset_cells(A549_TGFB1, n = 1500)

objects <- list(
A549_TGFB1 = A549_TGFB1_small,
A549_EGF = A549_EGF_small,
A549_TNF = A549_TNF_small
)

#Confirm the reduced size

sapply(objects, ncol)

## A549_TGFB1   A549_EGF   A549_TNF 
##       1500       1500       1500

4.1 Load an EMT gene set (GMT) from Zenodo

Many EMT scoring methods require a gene set. Here we demonstrate how to download a GMT file from Zenodo and parse it into a simple character vector of genes.

Download the GMT file:

library(BiocFileCache)
url <- "https://zenodo.org/records/18168504/files/HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION.v2025.1.Hs.gmt"

bfc <- BiocFileCache() # creates/uses a cache folder
gmt_file <- bfcrpath(bfc, url) # downloads once; returns the cached file path

## adding rname 'https://zenodo.org/records/18168504/files/HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION.v2025.1.Hs.gmt'

Parse GMT and extract EMT genes: A GMT file is tab-delimited: GeneSetName \t Description \t GENE1 \t GENE2 \t ... The helper below extracts all genes across gene sets in the GMT and returns unique gene symbols.

read_gmt <- function(fname){
  #Read all lines from the GMT file
  gmt_lines <- readLines(fname)
  #Split each line by tab; each line becomes a character vector
  gmt_list <- lapply(gmt_lines, function(x) unlist(strsplit(x, split="\t")))
  #For each line, genes start at element 3 (after name and description)
  gmt_genes <- lapply(gmt_list, function(x) x[3:length(x)])
  #Get unique gene list
  genes <- unique(unlist(gmt_genes))
  #Return a data frame whose column name is `gene` and whose values are gene names.
  return(data.frame(gene = genes, stringsAsFactors = FALSE))
}

Genesets <- read_gmt(gmt_file)
Genesets

##          gene
## 1      ABI3BP
## 2       ACTA2
## 3      ADAM12
## 4       ANPEP
## 5       APLP1
## 6        AREG
## 7       BASP1
## 8        BDNF
## 9         BGN
## 10       BMP1
## 11      CADM1
## 12      CALD1
## 13       CALU
## 14       CAP2
## 15       CAPG
## 16       CCN1
## 17       CCN2
## 18       CD44
## 19       CD59
## 20      CDH11
## 21       CDH2
## 22       CDH6
## 23    COL11A1
## 24    COL12A1
## 25    COL16A1
## 26     COL1A1
## 27     COL1A2
## 28     COL3A1
## 29     COL4A1
## 30     COL4A2
## 31     COL5A1
## 32     COL5A2
## 33     COL5A3
## 34     COL6A2
## 35     COL6A3
## 36     COL7A1
## 37     COL8A2
## 38   COLGALT1
## 39       COMP
## 40       COPA
## 41      CRLF1
## 42     CTHRC1
## 43      CXCL1
## 44     CXCL12
## 45      CXCL6
## 46      CXCL8
## 47       DAB2
## 48        DCN
## 49       DKK1
## 50     DPYSL3
## 51        DST
## 52       ECM1
## 53       ECM2
## 54      EDIL3
## 55     EFEMP2
## 56        ELN
## 57       EMP3
## 58       ENO2
## 59        FAP
## 60        FAS
## 61      FBLN1
## 62      FBLN2
## 63      FBLN5
## 64       FBN1
## 65       FBN2
## 66     FERMT2
## 67       FGF2
## 68       FLNA
## 69       FMOD
## 70        FN1
## 71      FOXC2
## 72      FSTL1
## 73      FSTL3
## 74      FUCA1
## 75       FZD8
## 76    GADD45A
## 77    GADD45B
## 78       GAS1
## 79        GEM
## 80       GJA1
## 81     GLIPR1
## 82       GPC1
## 83       GPX7
## 84      GREM1
## 85      HTRA1
## 86        ID2
## 87     IGFBP2
## 88     IGFBP3
## 89     IGFBP4
## 90       IL15
## 91       IL32
## 92        IL6
## 93      INHBA
## 94      ITGA2
## 95      ITGA5
## 96      ITGAV
## 97      ITGB1
## 98      ITGB3
## 99      ITGB5
## 100       JUN
## 101     LAMA1
## 102     LAMA2
## 103     LAMA3
## 104     LAMC1
## 105     LAMC2
## 106    LGALS1
## 107       LOX
## 108     LOXL1
## 109     LOXL2
## 110      LRP1
## 111    LRRC15
## 112       LUM
## 113    MAGEE1
## 114     MATN2
## 115     MATN3
## 116      MCM7
## 117      MEST
## 118     MFAP5
## 119       MGP
## 120      MMP1
## 121     MMP14
## 122      MMP2
## 123      MMP3
## 124      MSX1
## 125     MXRA5
## 126      MYL9
## 127      MYLK
## 128      NID2
## 129      NNMT
## 130    NOTCH2
## 131      NT5E
## 132       NTM
## 133      OXTR
## 134      P3H1
## 135    PCOLCE
## 136   PCOLCE2
## 137    PDGFRB
## 138    PDLIM4
## 139      PFN2
## 140     PLAUR
## 141     PLOD1
## 142     PLOD2
## 143     PLOD3
## 144    PMEPA1
## 145     PMP22
## 146     POSTN
## 147      PPIB
## 148     PRRX1
## 149     PRSS2
## 150     PTHLH
## 151      PTX3
## 152       PVR
## 153     QSOX1
## 154      RGS4
## 155      RHOB
## 156      SAT1
## 157      SCG2
## 158      SDC1
## 159      SDC4
## 160  SERPINE1
## 161  SERPINE2
## 162  SERPINH1
## 163     SFRP1
## 164     SFRP4
## 165      SGCB
## 166      SGCD
## 167      SGCG
## 168    SLC6A8
## 169     SLIT2
## 170     SLIT3
## 171     SNAI2
## 172     SNTB1
## 173     SPARC
## 174    SPOCK1
## 175      SPP1
## 176     TAGLN
## 177     TFPI2
## 178     TGFB1
## 179     TGFBI
## 180    TGFBR3
## 181      TGM2
## 182     THBS1
## 183     THBS2
## 184      THY1
## 185     TIMP1
## 186     TIMP3
## 187       TNC
## 188   TNFAIP3
## 189 TNFRSF11B
## 190 TNFRSF12A
## 191      TPM1
## 192      TPM2
## 193      TPM4
## 194     VCAM1
## 195      VCAN
## 196     VEGFA
## 197     VEGFC
## 198       VIM
## 199     WIPF1
## 200     WNT5A

4.2 Calculating EMT Scores for Datasets Stored in EMTscoreData

Calculated EMT scores can then be plotted against pseudotime to examine dynamic changes in epithelial–mesenchymal states.

Because this package is a data package, we keep the computation simple and robust for vignettes. Here we compute a demo EMT score defined as the mean log-expression across EMT genes for each cell:

Input: list of SingleCellExperiment/Seurat object + EMT gene vector Output: a list of Seurat objects, each containing a new metadata column with one score per cell.

## ------------------------------------------------------------------
## Prepare input objects
## ------------------------------------------------------------------
## We first organize the datasets into a named list. Each element
## corresponds to one experimental condition and contains a Seurat
## object (or a SingleCellExperiment object that will be converted
## internally). This unified structure allows us to apply the same
## EMT scoring procedure to multiple datasets in a consistent way.

objects <- list(
A549_TGFB1 = A549_TGFB1_small,
A549_EGF   = A549_EGF_small,
A549_TNF   = A549_TNF_small
)

## EMT gene list extracted from the GMT file. These genes define
## the epithelial–mesenchymal transition (EMT) signature used
## for score calculation.
emt_genes <- Genesets$gene


## ------------------------------------------------------------------
## Function: add_EMT_score
## ------------------------------------------------------------------
## This function computes an EMT score for each cell in a collection
## of Seurat (or SingleCellExperiment) objects using a predefined
## EMT gene set.
##
## Input:
##   - objects: a named list of Seurat objects or SingleCellExperiment
##              objects. Each object represents one dataset or condition.
##   - gmt_file: path to a GMT file containing EMT-related genes.
##   - emt_name: name of the metadata column that will store the EMT score.
##
## Output:
##   - A list of Seurat objects, each containing a new metadata column
##     with one EMT score per cell.
##
## Note:
##   For demonstration purposes in this vignette, the EMT score is
##   calculated using Seurat's AddModuleScore function. More advanced
##   or alternative EMT scoring methods are implemented in the
##   companion EMTscore package.

add_EMT_score <- function(objects,
                          gmt_file,
                          emt_name = "EMT_Score") {
  
  ## Read the EMT gene set from the GMT file
  Genesets <- read_gmt(gmt_file)
  emt_genes <- Genesets$gene
  
  ## Apply EMT scoring to each object in the list
  obj_list <- lapply(names(objects), function(name) {
    
    obj <- objects[[name]]
    
    # Convert SCE → Seurat
    if (inherits(obj, "SingleCellExperiment")) {
      message("Converting SCE to Seurat: ", name)
      obj <- as.Seurat(obj, data = "logcounts")
    }
    if (!inherits(obj, "Seurat")) {
      stop("Object ", name, " is not a Seurat or SCE object.")
    }
    
    ## Update the Seurat object to the current structure if needed.
    obj <- UpdateSeuratObject(obj)
    
    # get gene expression matrix
    geneExp <- GetAssayData(obj, assay = "RNA", layer = "data")
    
    ## Compute the EMT score using the EMT gene set.
    ## AddModuleScore returns a column named '<emt_name>1', which we
    ## subsequently rename for clarity.
    obj <- AddModuleScore(obj, features = list(emt_genes), name = emt_name, ctrl = 5)
    old <- paste0(emt_name, "1")
    obj <- SeuratObject::AddMetaData(
      object   = obj,
      metadata = obj[[old]][, 1, drop = TRUE],  # 提取为向量
      col.name = emt_name
    )
    obj[[old]] <- NULL
    return(obj)
  })
  
  
  ## Preserve original dataset names
  names(obj_list) <- names(objects)
  return(obj_list)
}


## ------------------------------------------------------------------
## Function: plot_EMT_from_objects
## ------------------------------------------------------------------
## This function visualizes EMT scores as a function of pseudotime
## across multiple datasets.
##
## Input:
##   - obj_list: a list of Seurat objects produced by add_EMT_score().
##   - col_name: name of the metadata column representing pseudotime.
##   - emt_score_col: name of the metadata column storing EMT scores.
##
## Output:
##   - A ggplot object showing smoothed EMT score trends along pseudotime
##     for each experimental condition.

plot_EMT_from_objects <- function(obj_list, col_name,
                                  emt_score_col) {
  
  ## Merge metadata from all Seurat objects into a single data frame
  ## for visualization.
  plot_df <- do.call(rbind, lapply(names(obj_list), function(name) {
    obj <- obj_list[[name]]
    df <- obj[[]][, c(col_name, emt_score_col), drop = FALSE]
    colnames(df) <- c(col_name, emt_score_col)
    ## Track the experimental condition
    df$Condition <- name
    ## Order cells by pseudotime for smoother visualization
    df <- df[order(df[[col_name]]), ]  
    df
  }))
  
  # Draw smooth curves
  p <- ggplot(plot_df, aes_string(x = col_name, y = emt_score_col, 
                                  color = "Condition")) +
    geom_smooth(method = "loess", se = FALSE, linewidth = 1.2) +
    theme_classic(base_size = 14) +
    labs(x = col_name, y = emt_score_col, color = "Condition")
  
  return(p)
}

## ------------------------------------------------------------------
## Run EMT scoring and visualization
## ------------------------------------------------------------------
## Compute EMT scores for all datasets and visualize their dynamics
## along pseudotime.

seurat_objs <- add_EMT_score(objects, 
                             gmt_file = gmt_file,
                             emt_name = "EMT_score")

## Converting SCE to Seurat: A549_TGFB1

## Validating object structure

## Updating object slots

## Ensuring keys are in the proper structure

## Updating matrix keys for DimReduc 'PCA'

## Updating matrix keys for DimReduc 'UMAP'

## Updating matrix keys for DimReduc 'UMAP_PSEUDO'

## Ensuring keys are in the proper structure

## Ensuring feature names don't have underscores or pipes

## Updating slots in RNA

## Updating slots in PCA

## Updating slots in UMAP

## Setting UMAP DimReduc to global

## Updating slots in UMAP_PSEUDO

## Setting UMAP_PSEUDO DimReduc to global

## Validating object structure for Assay 'RNA'

## Validating object structure for DimReduc 'PCA'

## Validating object structure for DimReduc 'UMAP'

## Validating object structure for DimReduc 'UMAP_PSEUDO'

## Object representation is consistent with the most current Seurat version

## Warning: The following features are not present in the object: ABI3BP, BGN,
## CCN1, CCN2, CDH11, CDH6, COL1A2, COL3A1, COL5A3, COL6A3, COL8A2, COMP, CXCL12,
## CXCL6, DCN, ECM2, EDIL3, EFEMP2, ELN, FAP, FBLN2, FMOD, GAS1, GPX7, IL6, ITGB3,
## LAMA1, LAMA2, LOX, LRRC15, MFAP5, MMP3, MXRA5, NID2, NTM, PDGFRB, PDLIM4,
## POSTN, PRRX1, PTX3, RGS4, SGCG, SLIT2, SNTB1, THBS2, TNC, TNFRSF11B, VCAM1,
## WNT5A, not searching for symbol synonyms

## Converting SCE to Seurat: A549_EGF

## Validating object structure

## Updating object slots

## Ensuring keys are in the proper structure

## Updating matrix keys for DimReduc 'PCA'

## Updating matrix keys for DimReduc 'UMAP'

## Updating matrix keys for DimReduc 'UMAP_PSEUDO'

## Ensuring keys are in the proper structure

## Ensuring feature names don't have underscores or pipes

## Updating slots in RNA

## Updating slots in PCA

## Updating slots in UMAP

## Setting UMAP DimReduc to global

## Updating slots in UMAP_PSEUDO

## Setting UMAP_PSEUDO DimReduc to global

## Validating object structure for Assay 'RNA'

## Validating object structure for DimReduc 'PCA'

## Validating object structure for DimReduc 'UMAP'

## Validating object structure for DimReduc 'UMAP_PSEUDO'

## Object representation is consistent with the most current Seurat version

## Warning: The following features are not present in the object: ABI3BP, ADAM12,
## BGN, CCN1, CCN2, CDH11, CDH6, COL1A2, COL3A1, COL5A3, COL6A3, COL8A2, COMP,
## CRLF1, CXCL12, CXCL6, DCN, ECM2, EDIL3, EFEMP2, ELN, FAP, FBLN2, FMOD, FOXC2,
## GAS1, GPX7, IL6, INHBA, ITGB3, LAMA1, LAMA2, LOX, LRRC15, LUM, MFAP5, MMP1,
## MMP14, MMP3, MXRA5, NID2, NTM, PDGFRB, PDLIM4, POSTN, PRRX1, PTX3, RGS4, SGCG,
## SLIT2, SNAI2, SNTB1, THBS2, TIMP3, TNC, TNFRSF11B, VCAM1, WNT5A, not searching
## for symbol synonyms

## Converting SCE to Seurat: A549_TNF

## Validating object structure

## Updating object slots

## Ensuring keys are in the proper structure

## Updating matrix keys for DimReduc 'PCA'

## Updating matrix keys for DimReduc 'UMAP'

## Updating matrix keys for DimReduc 'UMAP_PSEUDO'

## Ensuring keys are in the proper structure

## Ensuring feature names don't have underscores or pipes

## Updating slots in RNA

## Updating slots in PCA

## Updating slots in UMAP

## Setting UMAP DimReduc to global

## Updating slots in UMAP_PSEUDO

## Setting UMAP_PSEUDO DimReduc to global

## Validating object structure for Assay 'RNA'

## Validating object structure for DimReduc 'PCA'

## Validating object structure for DimReduc 'UMAP'

## Validating object structure for DimReduc 'UMAP_PSEUDO'

## Object representation is consistent with the most current Seurat version

## Warning: The following features are not present in the object: ABI3BP, ADAM12,
## BGN, CCN1, CCN2, CDH11, CDH6, COL1A2, COL3A1, COL5A3, COL6A3, COL8A2, COMP,
## CRLF1, CXCL12, CXCL6, DCN, ECM2, EDIL3, EFEMP2, ELN, FAP, FBLN2, FMOD, GAS1,
## GPX7, IL6, INHBA, ITGB3, LAMA1, LAMA2, LOX, LRRC15, LUM, MFAP5, MMP1, MMP14,
## MMP3, MXRA5, NTM, PDGFRB, PDLIM4, POSTN, PRRX1, PTX3, RGS4, SCG2, SGCG, SLIT2,
## SNAI2, SNTB1, THBS2, TNC, TNFRSF11B, VCAM1, WNT5A, not searching for symbol
## synonyms

p_Seurat <- plot_EMT_from_objects(seurat_objs, 
                                  col_name = "Pseudotime", 
                                  emt_score_col = "EMT_score")

## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

p_Seurat

## `geom_smooth()` using formula = 'y ~ x'

Session information:

sessionInfo()

## R Under development (unstable) (2026-01-15 r89304)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] SingleCellExperiment_1.33.0 ggplot2_4.0.1              
##  [3] Seurat_5.4.0                SeuratObject_5.3.0         
##  [5] sp_2.2-0                    SummarizedExperiment_1.41.0
##  [7] Biobase_2.71.0              GenomicRanges_1.63.1       
##  [9] Seqinfo_1.1.0               IRanges_2.45.0             
## [11] S4Vectors_0.49.0            MatrixGenerics_1.23.0      
## [13] matrixStats_1.5.0           ExperimentHub_3.1.0        
## [15] AnnotationHub_4.1.0         BiocFileCache_3.1.0        
## [17] dbplyr_2.5.1                BiocGenerics_0.57.0        
## [19] generics_0.1.4              EMTscoreData_0.99.10       
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3     jsonlite_2.0.0         magrittr_2.0.4        
##   [4] spatstat.utils_3.2-1   farver_2.1.2           rmarkdown_2.30        
##   [7] vctrs_0.7.1            ROCR_1.0-12            memoise_2.0.1         
##  [10] spatstat.explore_3.7-0 S4Arrays_1.11.1        htmltools_0.5.9       
##  [13] curl_7.0.0             SparseArray_1.11.10    sass_0.4.10           
##  [16] sctransform_0.4.3      parallelly_1.46.1      KernSmooth_2.23-26    
##  [19] bslib_0.10.0           htmlwidgets_1.6.4      ica_1.0-3             
##  [22] plyr_1.8.9             httr2_1.2.2            plotly_4.12.0         
##  [25] zoo_1.8-15             cachem_1.1.0           igraph_2.2.1          
##  [28] mime_0.13              lifecycle_1.0.5        pkgconfig_2.0.3       
##  [31] Matrix_1.7-4           R6_2.6.1               fastmap_1.2.0         
##  [34] fitdistrplus_1.2-6     future_1.69.0          shiny_1.12.1          
##  [37] digest_0.6.39          patchwork_1.3.2        AnnotationDbi_1.73.0  
##  [40] tensor_1.5.1           RSpectra_0.16-2        irlba_2.3.7           
##  [43] RSQLite_2.4.5          labeling_0.4.3         filelock_1.0.3        
##  [46] progressr_0.18.0       spatstat.sparse_3.1-0  mgcv_1.9-4            
##  [49] polyclip_1.10-7        abind_1.4-8            httr_1.4.7            
##  [52] compiler_4.6.0         withr_3.0.2            bit64_4.6.0-1         
##  [55] S7_0.2.1               DBI_1.2.3              fastDummies_1.7.5     
##  [58] MASS_7.3-65            DelayedArray_0.37.0    rappdirs_0.3.4        
##  [61] tools_4.6.0            lmtest_0.9-40          otel_0.2.0            
##  [64] httpuv_1.6.16          future.apply_1.20.1    goftest_1.2-3         
##  [67] glue_1.8.0             nlme_3.1-168           promises_1.5.0        
##  [70] grid_4.6.0             Rtsne_0.17             cluster_2.1.8.1       
##  [73] reshape2_1.4.5         spatstat.data_3.1-9    gtable_0.3.6          
##  [76] tidyr_1.3.2            data.table_1.18.2.1    XVector_0.51.0        
##  [79] spatstat.geom_3.7-0    RcppAnnoy_0.0.23       ggrepel_0.9.6         
##  [82] RANN_2.6.2             BiocVersion_3.23.1     pillar_1.11.1         
##  [85] stringr_1.6.0          spam_2.11-3            RcppHNSW_0.6.0        
##  [88] later_1.4.5            splines_4.6.0          dplyr_1.1.4           
##  [91] lattice_0.22-7         deldir_2.0-4           survival_3.8-6        
##  [94] bit_4.6.0              tidyselect_1.2.1       Biostrings_2.79.4     
##  [97] miniUI_0.1.2           pbapply_1.7-4          knitr_1.51            
## [100] gridExtra_2.3          scattermore_1.2        xfun_0.56             
## [103] stringi_1.8.7          lazyeval_0.2.2         yaml_2.3.12           
## [106] evaluate_1.0.5         codetools_0.2-20       tibble_3.3.1          
## [109] BiocManager_1.30.27    cli_3.6.5              uwot_0.2.4            
## [112] xtable_1.8-4           reticulate_1.44.1      jquerylib_0.1.4       
## [115] dichromat_2.0-0.1      Rcpp_1.1.1             spatstat.random_3.4-4 
## [118] globals_0.19.0         png_0.1-8              spatstat.univar_3.1-6 
## [121] parallel_4.6.0         blob_1.3.0             dotCall64_1.2         
## [124] listenv_0.10.0         viridisLite_0.4.2      scales_1.4.0          
## [127] ggridges_0.5.7         purrr_1.2.1            crayon_1.5.3          
## [130] rlang_1.1.7            cowplot_1.2.0          KEGGREST_1.51.1

EMTscoreData Package

haimei wen

2026-02-03

1. Introduction of EMTscoreData

2. Installation

Install the development version from GitHub

Install from Bioconductor (recommended once available):