1 Introduction

ISAnalytics is an R package developed to analyze gene therapy vector insertion sites data identified from genomics next generation sequencing reads for clonal tracking studies.

2 Installation and options

ISAnalytics can be installed quickly in different ways:

  • You can install it via Bioconductor
  • You can install it via GitHub using the package devtools

There are always 2 versions of the package active:

  • RELEASE is the latest stable version
  • DEVEL is the development version, it is the most up-to-date version where all new features are introduced

2.1 Installation from bioconductor

RELEASE version:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("ISAnalytics")

DEVEL version:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

# The following initializes usage of Bioc devel
BiocManager::install(version='devel')

BiocManager::install("ISAnalytics")

2.2 Installation from GitHub

RELEASE:

if (!require(devtools)) {
  install.packages("devtools")
}
devtools::install_github("calabrialab/ISAnalytics",
                         ref = "RELEASE_3_15",
                         dependencies = TRUE,
                         build_vignettes = TRUE)

DEVEL:

if (!require(devtools)) {
  install.packages("devtools")
}
devtools::install_github("calabrialab/ISAnalytics",
                         ref = "master",
                         dependencies = TRUE,
                         build_vignettes = TRUE)

2.3 Setting options

ISAnalytics has a verbose option that allows some functions to print additional information to the console while they’re executing. To disable this feature do:

# DISABLE
options("ISAnalytics.verbose" = FALSE)

# ENABLE
options("ISAnalytics.verbose" = TRUE)

Some functions also produce report in a user-friendly HTML format, to set this feature:

# DISABLE HTML REPORTS
options("ISAnalytics.reports" = FALSE)

# ENABLE HTML REPORTS
options("ISAnalytics.reports" = TRUE)

3 Setting up the workflow

In the newer version of ISAnalytics, we introduced a “dynamic variables system”, to allow more flexibility in terms of input formats. Before starting with the analysis workflow, you can specify how your inputs are structured so that the package can process them. For more information on how to do this take a look at vignette("workflow_start", package = "ISAnalytics").

4 The first steps

The first steps of the analysis workflow involve the import and parsing of data and metadata files from disk.

  • Import metadata with import_association_file() and/or import_Vispa2_stats()
  • Import data with import_single_Vispa2Matrix() or import_parallel_Vispa2Matrices()

Refer to the vignette vignette("workflow_start", package = "ISAnalytics") for more details.

5 Data cleaning and pre-processing

ISAnalytics offers several different functions for cleaning and pre-processing your data.

  • Recalibration: identifies integration events that are near to each other and condenses them into a single event whenever appropriate - compute_near_integrations()
  • Outliers identification and removal: identifies samples that are considered outliers according to user-defined logic and filters them out - outlier_filter()
  • Collision removal: identifies collision events between independent samples - remove_collisions(), see also the dedicated vignette vignette("workflow_start", package = "ISAnalytics")
  • Filter based on cell lineage purity: identifies and removes contamination between different cell types - purity_filter()
  • Data and metadata aggregation: allows the union of biological samples from single pcr replicates or other arbitrary aggregations - aggregate_values_by_key(), aggregate_metadata(), see also the dedicated vignette vignette("workflow_start", package = "ISAnalytics")

6 Answering biological questions

You can answer very different biological questions by using the provided functions with appropriate inputs.

  • Descriptive statistics: sample_statistics()
  • IS relative abundance: compute_abundance(), integration_alluvial_plot()
  • Top abundant IS: top_integrations()
  • Top targeted genes: top_targeted_genes()
  • Grubbs test for common insertion sites (CIS): CIS_grubbs(), CIS_volcano_plot()
  • Fisher’s exact test for gene frequency and IS distribution on target genome: gene_frequency_fisher(), fisher_scatterplot(), circos_genomic_density()
  • Clonal sharing analyses: is_sharing(), iss_source(), sharing_heatmap(), sharing_venn()
  • Estimate HSPCs population size: HSC_population_size_estimate(), HSC_population_plot()

For more, please refer to the full function reference.

7 Ensuring reproducibility of results

Several implemented functions produce static HTML reports that can be saved on disk, or tabular files. Reports contain the relevant information on how the function was called, inputs and outputs statistics, and session info for reproducibility.

8 Browse documentation online and keep updated

ISAnalytics has it’s dedicated package website where you can browse the documentation and vignettes easily, in addition to keeping up to date with all relevant updates. Visit the website at https://calabrialab.github.io/ISAnalytics/

9 Problems?

If you have any issues the documentation can’t solve, get in touch by opening an issue on GitHub or contacting the maintainers