importDemux {dittoSeq}R Documentation

Extracts Demuxlet information into a pre-made SingleCellExperiment or Seurat object

Description

Extracts Demuxlet information into a pre-made SingleCellExperiment or Seurat object

Usage

importDemux(
  object,
  raw.cell.names = NULL,
  lane.meta = NULL,
  lane.names = NA,
  demuxlet.best,
  trim.before_ = TRUE,
  bypass.check = FALSE,
  verbose = TRUE
)

Arguments

object

A pre-made Seurat(v3+) or SingleCellExperiment object to add demuxlet information to.

raw.cell.names

A string vector consisting of the raw cell barcodes of the object as they would have been output by cellranger aggr. Format per cell.name = NNN...NNN-# where NNN...NNN are the cell barcode nucleotides, and # is the lane number. This input should be used when additional information has been added directly into the cell names outside of Seurat's standard merge prefix: "user-text_".

lane.meta

A string which names a metadata slot that contains which cells came from which droplet-generation wells.

lane.names

String vector which sets how the lanes should be named (if you want to give them something different from the default = Lane1, Lane2, Lane3...)

demuxlet.best

String or String vector pointing to the location(s) of the .best output file from running of demuxlet.

Alternatively, a data.frame representing an already imported .best matrix.

trim.before_

Logical which sets whether any characters in front of an "_" should be deleted from the raw.cell.names before matching with demuxlet barcodes.

bypass.check

Logical which sets whether the function should run even when meta.data slots would be over-written.

verbose

whether to print messages about the stage of this process that is currently being run & also the summary at the end.

Details

The function takes in a previously generated Seurat or SingleCellExperiment object. It also takes in demuxlet information either in the form of

1: the location of a single demuxlet.best out file,

2: the locations of multiple demuxlet.best output files,

or 3: a user-constructed data.frame created by reading in a demuxlet.best file.

If a metadata slot name is provided to lane.meta, information in that metadata slot is copied into a metadata slot called "Lane". Alternatively, if lane.meta is left as NULL, separate lanes are assumed to be marked by distinct values of "-#", as is the typical output of the 10X cellranger count & aggr pipeline. In these situations, the lane.names input can be used to set specific names for each lane. "Lane1", "Lane2", "Lane3", etc, are used b y default.

The colnames(object) are used by default, but if these have been modified from what would have been given to demuxlet, outside of "-#" at the end or "***_" as can be added in common merge functions, you can alternatively provide raw.cell.names.

Barcodes in the demuxlet data are matched to barcodes in the object and then singlet/doublet/ambiguous calls and identities are parsed and carried into metadata. (When demuxlet information is provided as a set of separate files (recommended for use with cellranger aggr), the "-#" at the ends of barcodes in these files are incremented on read-in so that they can match the incrementation applied by cellranger aggr. See note on multi-well 10X data below for more.)

Finally, a summary of the results including mean number of SNPs and percentages of singlets and doublets is output unless verbose is set to FALSE.

Lane information and demuxlet calls and statistics are imported into the object as these metadata:

Note: "-#" information added by cellranger functions is not removed. Doing so would cause cells, from separate 10X wells, which ended up with similar barcodes to become indistinguishable. In demuxlet itself, ignorance of lane information leads to artificial doublet calls. In importDemux, ignorance of lane information can lead to import of improper demuxlet annotations. For this reason, importDemux checks for whether such artificial duplicates likely happened. See the recommended cellranger/demuxlet pipeline below for specific suggestions for how to use this function with multi-well 10X data.

Value

The Seurat or SingleCellExperiment object with metadata added for "Sample" calls and other relevant statistics.

For multi-well 10X data

10X recommends running cellranger counts individually for each well/lane. This leads to creation of separate genes x cells counts matrices for each lane. *Demuxlet should also be run separately for each lane in order to minimize the informatic generation of artificial doublets. Afterwards, there are many common methods of importing/merging such multi-well 10X data into a single object in R. Technical differences: All options will alter the cell barcode names in a way that makes them unique across lanes, but how they do can be different. Technical issue: Neither method adjusts the bacode names that are embedded within the BAM files which a user must supply to Demuxlet, so that data needs to be modified in a proper way in order to make the object cellnames and demuxlet BARCODEs match.

importDemux is built for working with the cellranger aggr barcodes output, but can be used for demuxlet datasets processed differently as well.

The fix: importDemux ignores all information before a "_" in cellnames when trim.before_ is left as TRUE, but utilizes the "-#" information at the ends of Seurat cellnames.

Run in these ways, demuxlet information can be matched to proper cells, and lane assignments can be properly reported in the "Lane" metadata slot.

Author(s)

Daniel Bunis

See Also

Included QC visualizations:

demux.calls.summary for plotting the number of sample annotations assigned within each lane.

demux.SNP.summary for plotting the number of SNPs measured per cell.

Or, see Kang et al. Nature Biotechnology, 2018 https://www.nature.com/articles/nbt.4042 for more information about the demuxlet cell-sample deconvolution method.

Examples


#Prep: loading in an example dataset and sample demuxlet data
example("importDittoBulk", echo = FALSE)
demux <- demuxlet.example
colnames(myRNA) <- demux$BARCODE[seq_len(ncol(myRNA))]

###
### Method 1: Lanes info stored in a metadata
###

# Notice there is a groups metadata in this Seurat object.
getMetas(myRNA)
# We will treat these as if that holds Lane information

# Now, running importDemux:
myRNA <- importDemux(
    myRNA,
    lane.meta = "groups",
    demuxlet.best = demux)

# Note, importDemux can also take in the location of the .best file.
#   myRNA <- importDemux(
#       object = myRNA,
#       lane.meta = "groups",
#       demuxlet.best = "Location/filename.best")

# demux.SNP.summary() and demux.calls.summary() can now be used.
demux.SNP.summary(myRNA)
demux.calls.summary(myRNA)

###
### Method 2: cellranger aggr combined data (denoted with "-#" in barcodes)
###

# If cellranger aggr was used, lanes will be denoted by "-1", "-2", ... "-#"
#   at the ends of Seurat cellnames.
# Demuxlet should be run on each lane individually.
# Provided locations of each demuxlet.best output file, *in the same order
#   that lanes were provided to cellranger aggr* this function will then
#   adjust the "-#" within the .best BARCODEs automatically before matching
#
# myRNA <- importDemux(
#     object = myRNA,
#     demuxlet.best = c(
#         "Location/filename1.best",
#         "Location/filename2.best"),
#     lane.names = c("g1","g2"))


[Package dittoSeq version 1.0.2 Index]