ISAnalytics 1.8.1
ISAnalytics is an R package developed to analyze gene therapy vector insertion sites data identified from genomics next generation sequencing reads for clonal tracking studies.
ISAnalytics
can be installed quickly in different ways:
devtools
There are always 2 versions of the package active:
RELEASE
is the latest stable versionDEVEL
is the development version, it is the most up-to-date version where
all new features are introducedRELEASE version:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("ISAnalytics")
DEVEL version:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
# The following initializes usage of Bioc devel
BiocManager::install(version='devel')
BiocManager::install("ISAnalytics")
RELEASE:
if (!require(devtools)) {
install.packages("devtools")
}
devtools::install_github("calabrialab/ISAnalytics",
ref = "RELEASE_3_16",
dependencies = TRUE,
build_vignettes = TRUE)
DEVEL:
if (!require(devtools)) {
install.packages("devtools")
}
devtools::install_github("calabrialab/ISAnalytics",
ref = "master",
dependencies = TRUE,
build_vignettes = TRUE)
ISAnalytics
has a verbose option that allows some functions to print
additional information to the console while they’re executing.
To disable this feature do:
# DISABLE
options("ISAnalytics.verbose" = FALSE)
# ENABLE
options("ISAnalytics.verbose" = TRUE)
Some functions also produce report in a user-friendly HTML format, to set this feature:
# DISABLE HTML REPORTS
options("ISAnalytics.reports" = FALSE)
# ENABLE HTML REPORTS
options("ISAnalytics.reports" = TRUE)
In the newer version of ISAnalytics, we introduced a “dynamic variables system”,
to allow more flexibility in terms of input formats. Before starting with the
analysis workflow, you can specify how your inputs are structured so that
the package can process them. For more information on how to do this
take a look at vignette("workflow_start", package = "ISAnalytics")
.
The first steps of the analysis workflow involve the import and parsing of data and metadata files from disk.
import_association_file()
and/or
import_Vispa2_stats()
import_single_Vispa2Matrix()
or
import_parallel_Vispa2Matrices()
Refer to the vignette
vignette("workflow_start", package = "ISAnalytics")
for
more details.
ISAnalytics offers several different functions for cleaning and pre-processing your data.
compute_near_integrations()
outlier_filter()
remove_collisions()
, see also the dedicated vignette
vignette("workflow_start", package = "ISAnalytics")
purity_filter()
aggregate_values_by_key()
, aggregate_metadata()
, see also the
dedicated vignette
vignette("workflow_start", package = "ISAnalytics")
You can answer very different biological questions by using the provided functions with appropriate inputs.
sample_statistics()
compute_abundance()
, integration_alluvial_plot()
top_integrations()
top_targeted_genes()
CIS_grubbs()
,
CIS_volcano_plot()
gene_frequency_fisher()
, fisher_scatterplot()
, circos_genomic_density()
is_sharing()
, iss_source()
, sharing_heatmap()
,
sharing_venn()
HSC_population_size_estimate()
,
HSC_population_plot()
For more, please refer to the full function reference.
ISAnalytics is designed to be flexible concerning input formats, thus it is suited to process various kinds of data provided the correct dynamic configuration is set.
We demonstrate this with an example that uses barcodes data. The matrix is publicly available here (Ferrari Samuele Jacob Aurelien, 2020), metadata was provided to us by the authors and it is available in the package additional files.
library(ISAnalytics)
#> Loading required package: magrittr
# Set appropriate data and metadata specs ----
metadata_specs <- tibble::tribble(
~names, ~types, ~transform, ~flag, ~tag,
"ProjectID", "char", NULL, "required", "project_id",
"SubjectID", "char", NULL, "required", "subject",
"Tissue", "char", NULL, "required", "tissue",
"TimePoint", "int", NULL, "required", "tp_days",
"CellMarker", "char", NULL, "required", "cell_marker",
"ID", "char", NULL, "required", "pcr_repl_id",
"SourceFileName", "char", NULL, "optional", NA_character_,
"Link", "char", NULL, "optional", NA_character_
)
set_af_columns_def(metadata_specs)
#> Warning: Warning: important tags missing
#> ℹ Some tags are required for proper execution of some functions. If these tags are not provided, execution of dependent functions might fail. Review your inputs carefully.
#> ℹ Missing tags: pool_id, fusion_id, tag_seq, vector_id, tag_id, pcr_replicate, vispa_concatenate, proj_folder
#> ℹ To see where these are involved type `inspect_tags(c('pool_id','fusion_id','tag_seq','vector_id','tag_id','pcr_replicate','vispa_concatenate','proj_folder'))`
#> Association file columns specs successfully changed
mandatory_specs <- tibble::tribble(
~names, ~types, ~transform, ~flag, ~tag,
"BarcodeSeq", "char", NULL, "required", NA_character_
)
set_mandatory_IS_vars(mandatory_specs)
#> Warning: Warning: important tags missing
#> ℹ Some tags are required for proper execution of some functions. If these tags are not provided, execution of dependent functions might fail. Review your inputs carefully.
#> ℹ Missing tags: chromosome, locus, is_strand
#> ℹ To see where these are involved type `inspect_tags(c('chromosome','locus','is_strand'))`
#> Mandatory IS vars successfully changed
# Files ----
data_folder <- system.file("testdata", package = "ISAnalytics")
meta_file <- "barcodes_example_af.tsv.xz"
matrix_file <- "GSE144340_Matrix_542.tsv.xz"
# Data import ----
af <- import_association_file(fs::path(data_folder, meta_file),
report_path = NULL
)
af
#> ProjectID SubjectID Tissue TimePoint CellMarker ID
#> 1: PMID32601433 A0 BM 21 Whole BM_A0
#> 2: PMID32601433 A0 PB 21 Whole PB21_A0
#> 3: PMID32601433 A1 BM 21 Whole BM_A1
#> 4: PMID32601433 A1 PB 21 Whole PB21_A1
#> 5: PMID32601433 A2 PB 21 Whole PB21_A2
#> 6: PMID32601433 A3 PB 21 Whole PB21_A3
#> 7: PMID32601433 A4 BM 21 Whole BM_A4
#> 8: PMID32601433 A4 PB 21 Whole PB21_A4
#> 9: PMID32601433 C0 PB 21 Whole PB21_C0
#> 10: PMID32601433 C1 BM 21 Whole BM_C1
#> 11: PMID32601433 C1 PB 21 Whole PB21_C1
#> 12: PMID32601433 C2 BM 21 Whole BM_C2
#> 13: PMID32601433 C2 PB 21 Whole PB21_C2
#> 14: PMID32601433 C3 BM 21 Whole BM_C3
#> 15: PMID32601433 C3 PB 21 Whole PB21_C3
#> SourceFileName
#> 1: GSE144340_Matrix_542.tsv.gz
#> 2: GSE144340_Matrix_542.tsv.gz
#> 3: GSE144340_Matrix_542.tsv.gz
#> 4: GSE144340_Matrix_542.tsv.gz
#> 5: GSE144340_Matrix_542.tsv.gz
#> 6: GSE144340_Matrix_542.tsv.gz
#> 7: GSE144340_Matrix_542.tsv.gz
#> 8: GSE144340_Matrix_542.tsv.gz
#> 9: GSE144340_Matrix_542.tsv.gz
#> 10: GSE144340_Matrix_542.tsv.gz
#> 11: GSE144340_Matrix_542.tsv.gz
#> 12: GSE144340_Matrix_542.tsv.gz
#> 13: GSE144340_Matrix_542.tsv.gz
#> 14: GSE144340_Matrix_542.tsv.gz
#> 15: GSE144340_Matrix_542.tsv.gz
#> Link
#> 1: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 2: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 3: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 4: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 5: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 6: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 7: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 8: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 9: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 10: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 11: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 12: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 13: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 14: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 15: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> TimepointMonths TimepointYears
#> 1: 01 01
#> 2: 01 01
#> 3: 01 01
#> 4: 01 01
#> 5: 01 01
#> 6: 01 01
#> 7: 01 01
#> 8: 01 01
#> 9: 01 01
#> 10: 01 01
#> 11: 01 01
#> 12: 01 01
#> 13: 01 01
#> 14: 01 01
#> 15: 01 01
matrix <- import_single_Vispa2Matrix(fs::path(data_folder, matrix_file),
sample_names_to = "ID"
)
#> Warning: compression format not supported by fread
#> ℹ File will be read using readr
#> Reading file...
#> ℹ Mode: classic
#> Reshaping...
#> *** File info ***
#> • --- Annotated: FALSE
#> • --- Dimensions: 31757 x 16
#> • --- Read mode: classic
#> • --- Sample count: 15
matrix
#> BarcodeSeq ID Value
#> 1: AAAAAAAATTTTTAAACGTACC BM_A0 1
#> 2: AAAAAACATATCTATAGTTACC BM_A0 1
#> 3: AAAAAATATATAAATAGATACC BM_A0 1
#> 4: AAAAACAACAAGGAAATTCAAT BM_A0 1
#> 5: AAAAACAACGAGGATAGTGAAT BM_A0 1
#> ---
#> 33772: TTTTGAGACCTTCACACCTACT PB21_C3 1
#> 33773: TTTTGCCACCTTCATACCCAAC PB21_C3 1
#> 33774: TTTTTAAACCGTTAGACCCGCA PB21_C3 1
#> 33775: TTTTTTCACGACAATAGCCAAT PB21_C3 1
#> 33776: TTTTTTCACTTGCACATCCGGC PB21_C3 1
# Descriptive stats ----
desc_stats <- sample_statistics(matrix, af,
sample_key = pcr_id_column(),
value_columns = "Value"
)$metadata %>%
dplyr::rename(distinct_barcodes = "nIS")
desc_stats
#> ProjectID SubjectID Tissue TimePoint CellMarker ID
#> 1: PMID32601433 A0 BM 21 Whole BM_A0
#> 2: PMID32601433 A0 PB 21 Whole PB21_A0
#> 3: PMID32601433 A1 BM 21 Whole BM_A1
#> 4: PMID32601433 A1 PB 21 Whole PB21_A1
#> 5: PMID32601433 A2 PB 21 Whole PB21_A2
#> 6: PMID32601433 A3 PB 21 Whole PB21_A3
#> 7: PMID32601433 A4 BM 21 Whole BM_A4
#> 8: PMID32601433 A4 PB 21 Whole PB21_A4
#> 9: PMID32601433 C0 PB 21 Whole PB21_C0
#> 10: PMID32601433 C1 BM 21 Whole BM_C1
#> 11: PMID32601433 C1 PB 21 Whole PB21_C1
#> 12: PMID32601433 C2 BM 21 Whole BM_C2
#> 13: PMID32601433 C2 PB 21 Whole PB21_C2
#> 14: PMID32601433 C3 BM 21 Whole BM_C3
#> 15: PMID32601433 C3 PB 21 Whole PB21_C3
#> SourceFileName
#> 1: GSE144340_Matrix_542.tsv.gz
#> 2: GSE144340_Matrix_542.tsv.gz
#> 3: GSE144340_Matrix_542.tsv.gz
#> 4: GSE144340_Matrix_542.tsv.gz
#> 5: GSE144340_Matrix_542.tsv.gz
#> 6: GSE144340_Matrix_542.tsv.gz
#> 7: GSE144340_Matrix_542.tsv.gz
#> 8: GSE144340_Matrix_542.tsv.gz
#> 9: GSE144340_Matrix_542.tsv.gz
#> 10: GSE144340_Matrix_542.tsv.gz
#> 11: GSE144340_Matrix_542.tsv.gz
#> 12: GSE144340_Matrix_542.tsv.gz
#> 13: GSE144340_Matrix_542.tsv.gz
#> 14: GSE144340_Matrix_542.tsv.gz
#> 15: GSE144340_Matrix_542.tsv.gz
#> Link
#> 1: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 2: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 3: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 4: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 5: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 6: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 7: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 8: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 9: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 10: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 11: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 12: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 13: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 14: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> 15: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE144nnn/GSE144340/suppl/GSE144340%5FMatrix%5F542%2Etsv%2Egz
#> TimepointMonths TimepointYears Value_shannon Value_simpson Value_invsimpson
#> 1: 01 01 2.952237 0.9113968 11.286277
#> 2: 01 01 3.459660 0.9488818 19.562506
#> 3: 01 01 3.006200 0.8870954 8.857038
#> 4: 01 01 3.774526 0.9453922 18.312391
#> 5: 01 01 3.181671 0.9222549 12.862544
#> 6: 01 01 3.389893 0.9401115 16.697684
#> 7: 01 01 2.820483 0.8857002 8.748925
#> 8: 01 01 3.345492 0.9344783 15.262125
#> 9: 01 01 3.843335 0.9588449 24.298324
#> 10: 01 01 3.201240 0.8979416 9.798313
#> 11: 01 01 3.194270 0.8805337 8.370564
#> 12: 01 01 2.615324 0.8670521 7.521742
#> 13: 01 01 3.515454 0.9382745 16.200749
#> 14: 01 01 2.557280 0.8453968 6.468172
#> 15: 01 01 3.929186 0.9397792 16.605561
#> Value_sum Value_count Value_describe_vars Value_describe_n
#> 1: 244879 2284 1 2284
#> 2: 81588 1080 1 1080
#> 3: 274792 2477 1 2477
#> 4: 104195 2269 1 2269
#> 5: 124676 1465 1 1465
#> 6: 180497 1786 1 1786
#> 7: 296246 2255 1 2255
#> 8: 177010 1538 1 1538
#> 9: 59966 2644 1 2644
#> 10: 303345 2993 1 2993
#> 11: 95971 2636 1 2636
#> 12: 343490 2223 1 2223
#> 13: 149100 2386 1 2386
#> 14: 277048 1817 1 1817
#> 15: 64118 3923 1 3923
#> Value_describe_mean Value_describe_sd Value_describe_median
#> 1: 107.21497 1521.7649 1
#> 2: 75.54444 556.4600 1
#> 3: 110.93742 1852.2801 1
#> 4: 45.92111 509.2056 1
#> 5: 85.10307 904.5528 1
#> 6: 101.06215 1040.5982 1
#> 7: 131.37295 2105.4943 1
#> 8: 115.09103 1149.9733 1
#> 9: 22.68003 235.5394 1
#> 10: 101.35149 1768.7577 1
#> 11: 36.40781 645.1812 1
#> 12: 154.51642 2652.4479 1
#> 13: 62.48952 755.9385 1
#> 14: 152.47551 2551.7136 1
#> 15: 16.34412 250.7138 1
#> Value_describe_trimmed Value_describe_mad Value_describe_min
#> 1: 1.160284 0 1
#> 2: 1.082176 0 1
#> 3: 1.234493 0 1
#> 4: 1.068795 0 1
#> 5: 1.083546 0 1
#> 6: 1.080420 0 1
#> 7: 1.248199 0 1
#> 8: 1.065747 0 1
#> 9: 1.086484 0 1
#> 10: 1.208768 0 1
#> 11: 1.136019 0 1
#> 12: 1.295110 0 1
#> 13: 1.115707 0 1
#> 14: 1.252234 0 1
#> 15: 1.083147 0 1
#> Value_describe_max Value_describe_range Value_describe_skew
#> 1: 48411 48410 21.84543
#> 2: 10411 10410 11.49921
#> 3: 68661 68660 29.32449
#> 4: 18252 18251 24.92426
#> 5: 19865 19864 17.48608
#> 6: 26055 26054 16.62791
#> 7: 70946 70945 25.02978
#> 8: 32536 32535 19.00613
#> 9: 5379 5378 15.44781
#> 10: 80197 80196 35.45585
#> 11: 28170 28169 35.63449
#> 12: 81381 81380 23.99526
#> 13: 27428 27427 24.99102
#> 14: 83741 83740 26.01411
#> 15: 10463 10462 29.78536
#> Value_describe_kurtosis Value_describe_se distinct_barcodes
#> 1: 564.8589 31.841939 2284
#> 2: 159.6777 16.932540 1080
#> 3: 969.0394 37.217196 2477
#> 4: 788.0556 10.689956 2269
#> 5: 342.6372 23.632797 1465
#> 6: 330.3889 24.623077 1786
#> 7: 716.3839 44.338480 2255
#> 8: 457.8748 29.323081 1538
#> 9: 276.7000 4.580710 2644
#> 10: 1478.2344 32.330691 2993
#> 11: 1447.9758 12.566345 2636
#> 12: 625.3701 56.257072 2223
#> 13: 793.6929 15.475733 2386
#> 14: 757.2388 59.862447 1817
#> 15: 1045.4960 4.002848 3923
# Aggregation and new stats ----
agg_key <- c("SubjectID")
agg <- aggregate_values_by_key(matrix, af,
key = agg_key,
group = "BarcodeSeq",
join_af_by = pcr_id_column()
)
agg
#> # A tibble: 33,267 × 3
#> BarcodeSeq SubjectID Value_sum
#> <chr> <chr> <dbl>
#> 1 AAAAAAAACACGGAGAACGACG C3 2
#> 2 AAAAAAAACGCGAACAACTACG C3 1
#> 3 AAAAAAAACTCAAAAAAGAAAT C3 1
#> 4 AAAAAAAATTTACACAAAGAAA A4 1
#> 5 AAAAAAAATTTTTAAACGTACC A0 1
#> 6 AAAAAACATATCTATAGTTACC A0 1
#> 7 AAAAAAGACGACGATAGGCACG C1 1
#> 8 AAAAAAGACGTTTATAGGTGTA A2 1
#> 9 AAAAAAGACTGCGACAAAAGGG A4 1
#> 10 AAAAAAGACTTTGATAACCACG C3 1
#> # … with 33,257 more rows
agg_meta_functions <- tibble::tribble(
~Column, ~Function, ~Args, ~Output_colname,
"TimePoint", ~ mean(.x, na.rm = TRUE), NA, "{.col}_avg",
"CellMarker", ~ length(unique(.x)), NA, "distinct_cell_marker_count",
"ID", ~ length(unique(.x)), NA, "distinct_id_count"
)
agg_meta <- aggregate_metadata(
af,
aggregating_functions = agg_meta_functions,
grouping_keys = agg_key
)
agg_meta
#> # A tibble: 9 × 4
#> SubjectID TimePoint_avg distinct_cell_marker_count distinct_id_count
#> <chr> <dbl> <int> <int>
#> 1 A0 21 1 2
#> 2 A1 21 1 2
#> 3 A2 21 1 1
#> 4 A3 21 1 1
#> 5 A4 21 1 2
#> 6 C0 21 1 1
#> 7 C1 21 1 2
#> 8 C2 21 1 2
#> 9 C3 21 1 2
agg_stats <- sample_statistics(agg, agg_meta,
sample_key = agg_key,
value_columns = "Value_sum"
)$metadata %>%
dplyr::rename(distinct_barcodes = "nIS")
agg_stats
#> # A tibble: 9 × 23
#> SubjectID TimePoint_…¹ disti…² disti…³ Value…⁴ Value…⁵ Value…⁶ Value…⁷ Value…⁸
#> <chr> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 A0 21 1 2 3.24 0.929 14.1 326467 3304
#> 2 A1 21 1 2 3.47 0.927 13.7 378987 4631
#> 3 A2 21 1 1 3.18 0.922 12.9 124676 1465
#> 4 A3 21 1 1 3.39 0.940 16.7 180497 1786
#> 5 A4 21 1 2 3.29 0.930 14.3 473256 3718
#> 6 C0 21 1 1 3.84 0.959 24.3 59966 2644
#> 7 C1 21 1 2 3.39 0.920 12.6 399316 5538
#> 8 C2 21 1 2 3.05 0.903 10.3 492590 4526
#> 9 C3 21 1 2 3.00 0.886 8.77 341166 5655
#> # … with 14 more variables: Value_sum_describe_vars <dbl>,
#> # Value_sum_describe_n <dbl>, Value_sum_describe_mean <dbl>,
#> # Value_sum_describe_sd <dbl>, Value_sum_describe_median <dbl>,
#> # Value_sum_describe_trimmed <dbl>, Value_sum_describe_mad <dbl>,
#> # Value_sum_describe_min <dbl>, Value_sum_describe_max <dbl>,
#> # Value_sum_describe_range <dbl>, Value_sum_describe_skew <dbl>,
#> # Value_sum_describe_kurtosis <dbl>, Value_sum_describe_se <dbl>, …
# Abundance ----
abundance <- compute_abundance(agg, columns = "Value_sum", key = agg_key)
abundance
#> # A tibble: 33,267 × 5
#> BarcodeSeq SubjectID Value_sum Value_sum_RelAbundance Value_sum…¹
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 AAAAAAAACACGGAGAACGACG C3 2 0.00000586 0.000586
#> 2 AAAAAAAACGCGAACAACTACG C3 1 0.00000293 0.000293
#> 3 AAAAAAAACTCAAAAAAGAAAT C3 1 0.00000293 0.000293
#> 4 AAAAAAAATTTACACAAAGAAA A4 1 0.00000211 0.000211
#> 5 AAAAAAAATTTTTAAACGTACC A0 1 0.00000306 0.000306
#> 6 AAAAAACATATCTATAGTTACC A0 1 0.00000306 0.000306
#> 7 AAAAAAGACGACGATAGGCACG C1 1 0.00000250 0.000250
#> 8 AAAAAAGACGTTTATAGGTGTA A2 1 0.00000802 0.000802
#> 9 AAAAAAGACTGCGACAAAAGGG A4 1 0.00000211 0.000211
#> 10 AAAAAAGACTTTGATAACCACG C3 1 0.00000293 0.000293
#> # … with 33,257 more rows, and abbreviated variable name
#> # ¹Value_sum_PercAbundance
reset_dyn_vars_config()
#> Mandatory IS vars reset to default
#> Annotation IS vars reset to default
#> Association file columns specs reset to default
#> ISS stats specs reset to default
#> Matrix suffixes specs reset to default
The package provides a simple Shiny interface for data exploration and plotting. To start the interface use:
NGSdataExplorer()
The application main page will show a loading screen for a file. It is possible to load files also from the R environment, for example, before opening the app, we can load the included association file:
data("association_file")
Once in the application we can choose "association_file"
from the
R environment loading option screen and click on “Import data”. Once
data is imported, we can click on the “Explore” tab in the upper navbar:
here we will see 2 tabs, one allows interactive exploration of data in tabular
form, in the other tab we can plot data. It is possible to customize several
different parameters for the plot and finally save it to file with the
dedicated button at the end of the page.
The Shiny interface is still currently under active development and new features will be added in the near future.
Several implemented functions produce static HTML reports that can be saved on disk, or tabular files. Reports contain the relevant information on how the function was called, inputs and outputs statistics, and session info for reproducibility.
ISAnalytics has it’s dedicated package website where you can browse the documentation and vignettes easily, in addition to keeping up to date with all relevant updates. Visit the website at https://calabrialab.github.io/ISAnalytics/
If you have any issues the documentation can’t solve, get in touch by opening an issue on GitHub or contacting the maintainers
[1] B. S. Ferrari Samuele Jacob Aurelien. “Efficient gene editing of human long-term hematopoietic stem cells validated by clonal tracking”. In: Nat Biotechnol 38, 1298–1308 (Nov. 2020). DOI: https://doi.org/10.1038/s41587-020-0551-y.