brainflowprobes 1.0.0
R
is an open-source statistical environment which can be easily modified to enhance its functionality via packages. brainflowprobes is an R
package available via the Bioconductor repository for packages. R
can be installed on any operating system from CRAN after which you can install brainflowprobes by using the following commands in your R
session:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("brainflowprobes")
## Check that you have a valid Bioconductor installation
BiocManager::valid()
## If you want to force the installation of the development version, you can
## do so by running. However, we suggest that you wait for Bioconductor to
## run checks and build the latest release.
BiocManager::install("LieberInstitute/brainflowprobes")
brainflowprobes is based on many other packages, particularly GenomicRanges, derfinder, and derfinderPlot. A brainflowprobes user is not expected to deal with those packages directly, but may find their manuals useful.
If you are asking yourself the question “How do I start using Bioconductor?” you might be interested in this blog post.
The blog post quoted above mentions some options, but we would like to highlight the Bioconductor support site as the main resource for getting help: remember to use the brainflowprobes
tag and check the older posts. Other alternatives are available such as creating GitHub issues and tweeting. However, please note that if you want to receive help you should adhere to the posting guidelines. It is particularly critical that you provide a small reproducible example and your session information so package developers can track down the source of the error.
We hope that brainflowprobes will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!
## Citation info
citation('brainflowprobes')
##
## Price AJ (2019). _Plots and annotation for choosing BrainFlow target
## probe sequence_. doi: 10.18129/B9.bioc.brainflowprobes (URL:
## https://doi.org/10.18129/B9.bioc.brainflowprobes),
## https://github.com/LieberInstitute/brainflowprobes - R package
## version 1.0.0, <URL:
## http://www.bioconductor.org/packages/brainflowprobes>.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {Plots and annotation for choosing BrainFlow target probe sequence},
## author = {Amanda J Price},
## year = {2019},
## url = {http://www.bioconductor.org/packages/brainflowprobes},
## note = {https://github.com/LieberInstitute/brainflowprobes - R package version 1.0.0},
## doi = {10.18129/B9.bioc.brainflowprobes},
## }
brainflowprobes is an R package that contains four functions to aid in designing probes to target RNA sequences in nuclei isolated from human postmortem brain using flow cytometry. brainflowprobes was made to support the method described in the BrainFlow publication, which is based on the Invitrogen PrimeFlow® RNA kit.
This package is currently only compatible with hg19 sequences. Also, because of the type of data used, the plotting functions plot_coverage()
and four_panels()
do not work on Windows machines. To visualize the data, please use R
installed on either Mac or Linux operating systems. The region_info()
function can still be used on Windows machines for creating an annotated .csv
file to get the information required for custom probe synthesis.
Which genes you choose to target will ultimately depend on the purpose of your experiment. BrainFlow can be used to isolate specific cell populations for downstream sequencing, for instance, or to assess the coexpression of up to four transcripts at a time at single-nucleus resolution. For the highest probability of success, however, several parameters should be considered when choosing a sequence to target, no matter the purpose. Target sequences are more likely to be successful if they:
More details about probe design and each of these considerations can be found in the BrainFlow manuscript. One strategy for choosing a sequence could be to choose the 3’UTR of a transcript of interest. Another strategy (and how many of the probes already validated in the BrainFlow manuscript were designed) is to identify expressed regions using the derfinder package. However you choose which sequences to test, the brainflowprobes package will help narrow down the best sequences for which to synthesize target probes.
Let’s say you want to design a probe to target deep layer pyramidal neurons in the prefrontal cortex. You choose TBR1 because of its role in neuronal identity specification in these cells, and you want to see if the last exon of this gene, a ~2.5 Kb sequence, would make for a good probe target. You find using the UCSC Genome browser (for instance) that the hg19 coordinates for this exon are chr2:162279880-162282378:+
, where chr2
is the chromosome, 162279880-162282378
is the start and end of the exon, and +
means that it is on the plus strand.
Before any of the brainflowprobes functions can be used, the package must be loaded and attached:
## Load brainflowprobes R package
library('brainflowprobes')
The next step is to use the region_info()
function to annotate the sequence in a .csv
file that can be used to custom synthesize a target probe:
region_info("chr2:162279880-162282378:+", CSV = FALSE, SEQ = TRUE, OUTDIR = ".")
## snapshotDate(): 2019-10-29
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## Completed! If CSV=TRUE, check for region_info.csv in the temporary
## directory (i.e. tempdir()) unless otherwise specified in OUTDIR.
## seqnames start end width strand name annotation
## 1 chr2 162279880 162282378 2499 + TBR1 NM_006593 NP_006584
## description region distance subregion insideDistance exonnumber nexons
## 1 inside exon inside 7072 inside exon 0 3 3
## UTR geneL codingL Geneid
## 1 overlaps 3'UTR 9573 7816 ENSG00000136535
## Sequence
## 1 GATCTACACCGGCTGTGACATGGACCGCCTGACCCCCTCGCCCAACGACTCGCCGCGCTCGCAGATCGTGCCCGGGGCCCGCTACGCCATGGCCGGCTCTTTCCTGCAGGACCAGTTCGTGAGCAACTACGCCAAGGCCCGCTTCCACCCGGGCGCGGGCGCGGGCCCCGGGCCGGGTACGGACCGCAGCGTGCCGCACACCAACGGGCTGCTGTCGCCGCAGCAGGCCGAGGACCCGGGCGCGCCCTCGCCGCAACGCTGGTTTGTGACGCCGGCCAACAACCGGCTGGACTTCGCGGCCTCGGCCTATGACACGGCCACGGACTTCGCGGGCAACGCGGCCACGCTGCTCTCTTACGCGGCGGCGGGCGTGAAGGCGCTGCCGCTGCAGGCTGCAGGCTGCACTGGCCGCCCGCTCGGCTACTACGCCGACCCGTCGGGCTGGGGCGCCCGCAGTCCCCCGCAGTACTGCGGCACCAAGTCGGGCTCGGTGCTGCCCTGCTGGCCCAACAGCGCCGCGGCCGCCGCGCGCATGGCCGGCGCCAATCCCTACCTGGGCGAGGAGGCCGAGGGCCTGGCCGCCGAGCGCTCGCCGCTGCCGCCCGGCGCCGCCGAGGACGCCAAGCCCAAGGACCTGTCCGATTCCAGCTGGATCGAGACGCCCTCCTCGATCAAGTCCATCGACTCCAGCGACTCGGGGATTTACGAGCAGGCCAAGCGGAGGCGGATCTCGCCGGCCGACACGCCCGTGTCCGAGAGTTCGTCCCCGCTCAAGAGCGAGGTGCTGGCCCAGCGGGACTGCGAGAAGAACTGCGCCAAGGACATTAGCGGCTACTATGGCTTCTACTCGCACAGCTAGGCCGCCCCTGCCCGCCCGGCCCCGCCGCGGCCCGGACCCCCAGCCAGCCCCTCACAGCTCTTCCCCAGCTCCGCCTCCCCACACTCCTCCTTGCGCACCCACTCATTTTATTTGACCCTCGATGGCCGTCTGCAGCGAATAAGTGCAGGTCTCCGAGCGTGATTTTAACCTTTTTTGCACAGCAGTCTCTGCAATTAGCTCACCGACCTTCAACTTTGCTGTAAACCTTTTGGTTTTCCTACTTACTCTTCTTCTGTGGAGTTATCCTCCTACAATTCCCCTCCCCCTCGTCTTTCTCTTACCTCCTACTTCTCTTTCTTGTAATGAAACTCTTCACCTTTAGGAGACCTGGGCAGTCCTGTCAGGCAGCAGCGATTCCGACCCGCCAAGTCTCGGCCTCCACATTAACCATAGGATGTTGACTCTAGAACCTGGACCCACCCAGCGCGTCCTTTCTTATCCCCGAGTGGATGGATGGATGGATGGATGGTAGGGATGTTAATAATTTTAGTGGAACAAAGCCTGTGAAATGATTGTACATAGTGTTAATTTATTGTAACGAATGGCTAGTTTTTATTCTCGTCAAGGCACAAAACCAGTTCATGCTTAACCTTTTTTTCCTTTCCTTTCTTTGCTTTTCTTTCTCTCCTCTCATACTTTCTCTTCTCTCTCTTTTAATTTTCTTGTGAGATAATATTCTAAGAGGCTCTAGAAACATGAAATACTCAGTAGTGATGGGTTTCCCACTTCTCCTCAATCCGTTGCATGAAATAATTACTATGTGCCCTAATGCACACAAATAGCTAAGGAGAATCCACCCAAACACCTTTAAAGGATAGGTGTCTGTTCATAGGCAAGTCGATTAAGTGGCATGATGCCTGCAAAGCAAAGTCAACTGGAGTTGTATGTTCCCCCCACCTTCTAAATAGAATAGCTCGACATCAGCAATATTATTTTGCCTTATTTGTTTTTCCCCAAAGTGCCAAATCCATTACTGGTCTGTGCAGGTGCCAAATATGCTGACAAACTGTTTCTGAATATCTTTCAGTACCCCTTCACCTTTATATGCTGTAAATCTTTGTAATGAATACTCTATTAATGATATAGATGACTGAATTGTTGGTAACTATAGTGTAGTCTAGTGAAGATGAATTGTGTGAGTTGTATATTTTACTGCATTTTAGTTTTGAAAATGACTTCCCCACCACCTAGAAACAGCTGAAATTTGACTTCCTTGGGAGAACACTAGCATTAATGCAAGTAAGACTGATTTTCCCCTAAGTCTTGTTATATTTGATAAGGAGCATTAATCCCCCTGGAAATAGATTAGTAGGATTTCTAATGTTGTGTAGCAAACCTATACTTTTTTGTATTTAAAAATTAATGTGAAATATGCATCATACACAATATTCAATCTAGATTCCAGTCCATGGGGGGATTTTTCCTAATAGGAATTCAGGGTCTAAACGTGTGTATATTTTGGCTCTTCTGTAAATCTAATGTTGTGATTTTTATATTTGTTTCGTTTTGTCTGTGAACTGAATAATTTATACAAGAACACACTCCATTGAGAAACGTTTTGTTTTTTGCTCGTTTGTATCGTCTGTGTATAACAAGTAAAATAAACCTGGTAAAAACGC
This function will annotate the specified genomic regions (sequences) by calling the matchGenes()
function from the bumphunter package. The output is a table where the first six columns include the coordinates (chromosome, start, end, and strand), region width, and the name of the nearest gene. The proceeding columns list further information about the nearest gene and where the region is in relation to that gene. These columns are described in the documentation for matchGenes()
, reprinted below:
Column | Description |
---|---|
annotation | RefSeq ID |
description | a factor with levels c(“upstream”, “promoter”, “overlaps 5’”, “inside intron”, “inside exon”, “covers exon(s)”, “overlaps exon upstream”, “overlaps exon downstream”, “overlaps two exons”, “overlaps 3’”, “close to 3’”, “downstream”, “covers”) |
region | a factor with levels c(“upstream”, “promoter”, “overlaps 5’”, “inside”, “overlaps 3’”, “close to 3’”, “downstream”, “covers”) |
distance | distance before 5’ end of gene |
subregion | a factor with levels c(“inside intron”, “inside exon”, “covers exon(s)”, “overlaps exon upstream”, “overlaps exon downstream”, “overlaps two exons”) |
insideDistance | distance past 5’ end of gene |
exonnumber | which exon |
nexons | number of exons |
UTR | a factor with levels c(“inside transcription region”, “5’ UTR”, “overlaps 5’ UTR”, “3’UTR”, “overlaps 3’UTR”, “covers transcription region”) |
geneL | the gene length |
codingL | the coding length |
Entrez | Entrez ID of closest gene |
If CSV = TRUE
, a file called region_info.csv
will be printed in your working directory, unless another location and file name are listed in the OUTDIR
option. It is important that chr
preceeds the chromosome number, that the sequence is hg19, and that the candidate coordinates are encased in quotation marks (this tells R
that it should read the coordinates as a character class). The output of this function is what can be sent to Invitrogen to specify the sequences you would like to be targeted.
brainflowprobes includes two functions that visualize expression information about candidate target regions based on four external datasets. Coverage data for these external datasets are stored online as BigWig tracks that unfortunately cannot be called using R
on a Windows operating system. For users with Mac or Linux machines, expression of candidate regions in these datasets can be loaded and visualized using plot_coverage()
and four_panels()
.
plot_coverage()
uses the getRegionCoverage()
function from the derfinder package to cut the coverage values for region(s) of interest (specified in the REGION
option) in a set of nuclear (N) and cytoplasmic (C) RNA-seq samples derived from adult (A) and fetal (F) human cortex. In this dataset, two different RNA-seq libraries based on either polyA-enrichment (P) or rRNA depletion (R) were generated and sequenced for each fraction-age pair, resulting in eight groups of samples. Optimal candidate target probe sequences will be highly and uniformly expressed across the region.
plot_coverage("chr2:162279880-162282378:+",
PDF = "regionCoverage_fractionedData.pdf",
OUTDIR = '.',
COVERAGE = NULL, VERBOSE = FALSE)