importDemux {dittoSeq} | R Documentation |
Extracts Demuxlet information into a pre-made SingleCellExperiment or Seurat object
importDemux( object, raw.cell.names = NULL, lane.meta = NULL, lane.names = NA, demuxlet.best, trim.before_ = TRUE, bypass.check = FALSE, verbose = TRUE )
object |
A pre-made Seurat(v3+) or SingleCellExperiment object to add demuxlet information to. |
raw.cell.names |
A string vector consisting of the raw cell barcodes of the object as they would have been output by cellranger aggr. Format per cell.name = NNN...NNN-# where NNN...NNN are the cell barcode nucleotides, and # is the lane number. This input should be used when additional information has been added directly into the cell names outside of Seurat's standard merge prefix: "user-text_". |
lane.meta |
A string which names a metadata slot that contains which cells came from which droplet-generation wells. |
lane.names |
String vector which sets how the lanes should be named (if you want to give them something different from the default = Lane1, Lane2, Lane3...) |
demuxlet.best |
String or String vector pointing to the location(s) of the .best output file from running of demuxlet. Alternatively, a data.frame representing an already imported .best matrix. |
trim.before_ |
Logical which sets whether any characters in front of an "_" should be deleted from the |
bypass.check |
Logical which sets whether the function should run even when meta.data slots would be over-written. |
verbose |
whether to print messages about the stage of this process that is currently being run & also the summary at the end. |
The function takes in a previously generated Seurat or SingleCellExperiment object. It also takes in demuxlet information either in the form of
1: the location of a single demuxlet.best out file,
2: the locations of multiple demuxlet.best output files,
or 3: a user-constructed data.frame created by reading in a demuxlet.best file.
If a metadata slot name is provided to lane.meta
, information in that metadata slot is copied into a metadata slot called "Lane".
Alternatively, if lane.meta
is left as NULL
, separate lanes are assumed to be marked by distinct values of "-#", as is the typical output of the 10X cellranger count & aggr pipeline.
In these situations, the lane.names
input can be used to set specific names for each lane. "Lane1", "Lane2", "Lane3", etc, are used b y default.
The colnames(object)
are used by default, but if these have been modified from what would have been given to demuxlet, outside of "-#" at the end or "***_" as can be added in common merge functions,
you can alternatively provide raw.cell.names
.
Barcodes in the demuxlet data are matched to barcodes in the object
and then singlet/doublet/ambiguous calls and identities are parsed and carried into metadata.
(When demuxlet information is provided as a set of separate files (recommended for use with cellranger aggr),
the "-#" at the ends of barcodes in these files are incremented on read-in so that they can match the incrementation applied by cellranger aggr.
See note on multi-well 10X data below for more.)
Finally, a summary of the results including mean number of SNPs and percentages of singlets and doublets is output unless verbose
is set to FALSE
.
Lane information and demuxlet calls and statistics are imported into the object
as these metadata:
Lane = guided by lane.meta
import input or "-#"s in barcodes, represents the separate droplet-generation lanes.
Sample = The sample call, parsed from the BEST column
demux.doublet.call = whether the sample was a singlet (SNG), doublet (DBL), or ambiguious (AMB), parsed from the BEST column
demux.RD.TOTL = RD.TOTL column
demux.RD.PASS = RD.PASS column
demux.RD.UNIQ = RD.UNIQ column
demux.N.SNP = N.SNP column
demux.PRB.DBL = PRB.DBL column
demux.barcode.dup = (Only generated when TRUEs will exist) whether a cell's barcode in the demuxlet.best refered to only 1 cell in the object
.
(When TRUE, indicates that cells from distinct lanes were interpretted together by demuxlet.
These will often be mistakenly called as doublets.)
Note: "-#" information added by cellranger functions is not removed.
Doing so would cause cells, from separate 10X wells, which ended up with similar barcodes to become indistinguishable.
In demuxlet itself, ignorance of lane information leads to artificial doublet calls.
In importDemux
, ignorance of lane information can lead to import of improper demuxlet annotations.
For this reason, importDemux
checks for whether such artificial duplicates likely happened.
See the recommended cellranger/demuxlet pipeline below for specific suggestions for how to use this function with multi-well 10X data.
The Seurat or SingleCellExperiment object with metadata added for "Sample" calls and other relevant statistics.
10X recommends running cellranger counts individually for each well/lane.
This leads to creation of separate genes x cells counts matrices for each lane.
*Demuxlet should also be run separately for each lane in order to minimize the informatic generation of artificial doublets.
Afterwards, there are many common methods of importing/merging such multi-well 10X data into a single object in R.
Technical differences: All options will alter the cell barcode names in a way that makes them unique across lanes, but how they do can be different.
Technical issue: Neither method adjusts the bacode names that are embedded within the BAM files which a user must supply to Demuxlet,
so that data needs to be modified in a proper way in order to make the object
cellnames and demuxlet BARCODEs match.
importDemux
is built for working with the cellranger aggr barcodes output, but can be used for demuxlet datasets processed differently as well.
Option 1: merging matrices of all lanes with cellranger aggr before R import. Barcode uniquification method: A "-1", "-2", "-3", ... "-#" is appended to the end of all barcode names. The number is incremented for each succesive lane. (Note: lane-numbers depend on the order in which they were supplied to cellranger aggr.)
Option 2: Importing into Seurat or SingleCellExperiment, then merging these objects.
Barcode uniquifiction method: user-defined strings are appended to the start of the barcodes, followed by an "_", for Seurat merge, and importDemux will ignore these.
Alternatively, consistent barcodes can be supplied separately to the raw.cell.names
input.
The fix:
importDemux
ignores all information before a "_" in cellnames when trim.before_
is left as TRUE,
but utilizes the "-#" information at the ends of Seurat cellnames.
Option 1: importDemux
can adjust the "-#" in the Demuxlet BARCODEs automatically for users before performing the matching step.
In order to take advantage of the automatic barcodes adjustment, just supply a vector containing the locations of the sepearate .best outputs for each lane, in the same order that lanes were combined in cellranger aggr.
Option 2: To use with this method, it's easiest to run importDemux
on each lane's Seurat or SingleCellExperiment object separately & provide a unique name for each lane to the lane.names
input, BEFORE merging into a single Seurat object.
Run in these ways, demuxlet information can be matched to proper cells, and lane assignments can be properly reported in the "Lane" metadata slot.
Daniel Bunis
Included QC visualizations:
demux.calls.summary
for plotting the number of sample annotations assigned within each lane.
demux.SNP.summary
for plotting the number of SNPs measured per cell.
Or, see Kang et al. Nature Biotechnology, 2018 https://www.nature.com/articles/nbt.4042 for more information about the demuxlet cell-sample deconvolution method.
#Prep: loading in an example dataset and sample demuxlet data example("importDittoBulk", echo = FALSE) demux <- demuxlet.example colnames(myRNA) <- demux$BARCODE[seq_len(ncol(myRNA))] ### ### Method 1: Lanes info stored in a metadata ### # Notice there is a groups metadata in this Seurat object. getMetas(myRNA) # We will treat these as if that holds Lane information # Now, running importDemux: myRNA <- importDemux( myRNA, lane.meta = "groups", demuxlet.best = demux) # Note, importDemux can also take in the location of the .best file. # myRNA <- importDemux( # object = myRNA, # lane.meta = "groups", # demuxlet.best = "Location/filename.best") # demux.SNP.summary() and demux.calls.summary() can now be used. demux.SNP.summary(myRNA) demux.calls.summary(myRNA) ### ### Method 2: cellranger aggr combined data (denoted with "-#" in barcodes) ### # If cellranger aggr was used, lanes will be denoted by "-1", "-2", ... "-#" # at the ends of Seurat cellnames. # Demuxlet should be run on each lane individually. # Provided locations of each demuxlet.best output file, *in the same order # that lanes were provided to cellranger aggr* this function will then # adjust the "-#" within the .best BARCODEs automatically before matching # # myRNA <- importDemux( # object = myRNA, # demuxlet.best = c( # "Location/filename1.best", # "Location/filename2.best"), # lane.names = c("g1","g2"))