1 Introduction

In this vignette we explain in more detail how to perform sharing analyses with ISAnalytics and its dedicated sharing functions.

2 Installation and options

ISAnalytics can be installed quickly in different ways:

  • You can install it via Bioconductor
  • You can install it via GitHub using the package devtools

There are always 2 versions of the package active:

  • RELEASE is the latest stable version
  • DEVEL is the development version, it is the most up-to-date version where all new features are introduced

2.1 Installation from bioconductor

RELEASE version:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("ISAnalytics")

DEVEL version:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

# The following initializes usage of Bioc devel
BiocManager::install(version='devel')

BiocManager::install("ISAnalytics")

2.2 Installation from GitHub

RELEASE:

if (!require(devtools)) {
  install.packages("devtools")
}
devtools::install_github("calabrialab/ISAnalytics",
                         ref = "RELEASE_3_14",
                         dependencies = TRUE,
                         build_vignettes = TRUE)

DEVEL:

if (!require(devtools)) {
  install.packages("devtools")
}
devtools::install_github("calabrialab/ISAnalytics",
                         ref = "master",
                         dependencies = TRUE,
                         build_vignettes = TRUE)

2.3 Setting options

ISAnalytics has a verbose option that allows some functions to print additional information to the console while they’re executing. To disable this feature do:

# DISABLE
options("ISAnalytics.verbose" = FALSE)

# ENABLE
options("ISAnalytics.verbose" = TRUE)

Some functions also produce report in a user-friendly HTML format, to set this feature:

# DISABLE HTML REPORTS
options("ISAnalytics.reports" = FALSE)

# ENABLE HTML REPORTS
options("ISAnalytics.reports" = TRUE)

3 Shared integration sites

An integration site is always characterized by a triple of values: (chr, integration_locus, strand), hence these attributes are always present in integration matrices.

library(ISAnalytics)
data("integration_matrices")
data("association_file")
#>    chr integration_locus strand     GeneName GeneStrand
#> 1:  16          68164148      +       NFATC3          +
#> 2:   4         129390130      + LOC100507487          +
#> 3:   5          84009671      -        EDIL3          -
#> 4:  12          54635693      -         CBX5          -
#> 5:   5          84009671      -        EDIL3          -
#> 6:  12          54635693      -         CBX5          -
#>                                                 CompleteAmplificationID seqCount fragmentEstimate
#> 1: PJ01_POOL01_LTR75LC38_PT001_PT001-103_lenti_GLOBE_PB_1_SLiM_0060_MNC      182        102.94572
#> 2:  PJ01_POOL01_LTR53LC32_PT001_PT001-81_lenti_GLOBE_BM_1_SLiM_0180_MNC    23219         68.73747
#> 3:  PJ01_POOL01_LTR53LC32_PT001_PT001-81_lenti_GLOBE_BM_1_SLiM_0180_MNC    20205         67.12349
#> 4:  PJ01_POOL01_LTR83LC66_PT001_PT001-81_lenti_GLOBE_BM_1_SLiM_0180_MNC    13269         65.15760
#> 5:  PJ01_POOL01_LTR83LC66_PT001_PT001-81_lenti_GLOBE_BM_1_SLiM_0180_MNC    14748         61.46981
#> 6:  PJ01_POOL01_LTR27LC94_PT001_PT001-81_lenti_GLOBE_BM_1_SLiM_0180_MNC    12588         60.84781

We can aggregate our data in different ways according to our needs (to know more about this topic take a look at the vignette vignette("aggregate_function_usage", package = "ISAnalytics")), obtaining therefore different groups. Each group has an associated set of integration sites.

## Aggregation by standard key
agg <- aggregate_values_by_key(integration_matrices,
                               association_file,
                               value_cols = c("seqCount", "fragmentEstimate"))
agg <- agg %>% dplyr::filter(TimePoint %in% c("0030", "0060"))
#> # A tibble: 419 × 11
#>    chr   integration_locus strand GeneName GeneStrand SubjectID CellMarker Tissue TimePoint seqCount_sum
#>    <chr>             <dbl> <chr>  <chr>    <chr>      <chr>     <chr>      <chr>  <chr>            <dbl>
#>  1 1               8464757 -      RERE     -          PT001     MNC        BM     0030               542
#>  2 1               8464757 -      RERE     -          PT001     MNC        BM     0060                 1
#>  3 1               8607357 +      RERE     -          PT001     MNC        BM     0060                 1
#>  4 1              11339120 +      UBIAD1   +          PT001     MNC        BM     0060              1605
#>  5 1              11339120 +      UBIAD1   +          PT001     MNC        PB     0060                 1
#>  6 1              16186297 -      SPEN     +          PT001     MNC        BM     0030                 1
#>  7 1              16186297 -      SPEN     +          PT001     MNC        PB     0060                 1
#>  8 1              16602483 +      FBXO42   -          PT001     MNC        BM     0060              2947
#>  9 1              25337264 -      MIR4425  +          PT002     MNC        BM     0030                23
#> 10 1              25337264 -      MIR4425  +          PT002     MNC        PB     0060                36
#>    fragmentEstimate_sum
#>                   <dbl>
#>  1                 3.01
#>  2                 1.00
#>  3                 1.00
#>  4                 8.03
#>  5                 1.00
#>  6                 1.00
#>  7                 1.00
#>  8                 9.04
#>  9                 9.14
#> 10                 7.07
#> # … with 409 more rows

An integration site is shared between two or more groups if the same triple is observed in all the groups considered.

4 Automated sharing counts

ISAnalytics provides the function is_sharing() for computing automated sharing counts. The function has several arguments that can be tuned according to user needs.

4.1 SCENARIO 1: single input data frame and single grouping key

sharing_1 <- is_sharing(agg, 
                        group_key = c("SubjectID", "CellMarker", 
                                      "Tissue", "TimePoint"),
                        n_comp = 2,
                        is_count = TRUE,
                        relative_is_sharing = TRUE,
                        minimal = TRUE,
                        include_self_comp = FALSE, 
                        keep_genomic_coord = TRUE)
#> Calculating combinations...
#> Done!
sharing_1
#>                    g1                g2 shared           is_coord count_g1 count_g2 count_union     on_g1     on_g2
#>  1: PT001_MNC_BM_0030 PT001_MNC_BM_0060     21 <data.table[21x3]>       54      114         147 38.888889 18.421053
#>  2: PT001_MNC_BM_0030 PT001_MNC_PB_0060      8  <data.table[8x3]>       54       59         105 14.814815 13.559322
#>  3: PT001_MNC_BM_0060 PT001_MNC_PB_0060     29 <data.table[29x3]>      114       59         144 25.438596 49.152542
#>  4: PT001_MNC_PB_0030 PT001_MNC_PB_0060     10 <data.table[10x3]>       28       59          77 35.714286 16.949153
#>  5: PT001_MNC_BM_0030 PT002_MNC_BM_0030      0  <data.table[0x3]>       54       98         152  0.000000  0.000000
#>  6: PT001_MNC_BM_0060 PT002_MNC_BM_0030      1  <data.table[1x3]>      114       98         211  0.877193  1.020408
#>  7: PT001_MNC_PB_0060 PT002_MNC_BM_0030      1  <data.table[1x3]>       59       98         156  1.694915  1.020408
#>  8: PT001_MNC_PB_0030 PT002_MNC_BM_0030      0  <data.table[0x3]>       28       98         126  0.000000  0.000000
#>  9: PT001_MNC_BM_0030 PT002_MNC_PB_0060      0  <data.table[0x3]>       54       18          72  0.000000  0.000000
#> 10: PT001_MNC_BM_0060 PT002_MNC_PB_0060      0  <data.table[0x3]>      114       18         132  0.000000  0.000000
#> 11: PT001_MNC_PB_0060 PT002_MNC_PB_0060      0  <data.table[0x3]>       59       18          77  0.000000  0.000000
#> 12: PT002_MNC_BM_0030 PT002_MNC_PB_0060      8  <data.table[8x3]>       98       18         108  8.163265 44.444444
#> 13: PT002_MNC_PB_0030 PT002_MNC_PB_0060      7  <data.table[7x3]>       15       18          26 46.666667 38.888889
#> 14: PT001_MNC_PB_0030 PT002_MNC_PB_0060      0  <data.table[0x3]>       28       18          46  0.000000  0.000000
#> 15: PT002_MNC_BM_0060 PT002_MNC_PB_0060      5  <data.table[5x3]>       33       18          46 15.151515 27.777778
#> 16: PT001_MNC_BM_0030 PT002_MNC_PB_0030      0  <data.table[0x3]>       54       15          69  0.000000  0.000000
#> 17: PT001_MNC_BM_0060 PT002_MNC_PB_0030      1  <data.table[1x3]>      114       15         128  0.877193  6.666667
#> 18: PT001_MNC_PB_0060 PT002_MNC_PB_0030      0  <data.table[0x3]>       59       15          74  0.000000  0.000000
#> 19: PT002_MNC_BM_0030 PT002_MNC_PB_0030      3  <data.table[3x3]>       98       15         110  3.061224 20.000000
#> 20: PT001_MNC_PB_0030 PT002_MNC_PB_0030      0  <data.table[0x3]>       28       15          43  0.000000  0.000000
#> 21: PT002_MNC_BM_0060 PT002_MNC_PB_0030      2  <data.table[2x3]>       33       15          46  6.060606 13.333333
#> 22: PT001_MNC_BM_0030 PT001_MNC_PB_0030      7  <data.table[7x3]>       54       28          75 12.962963 25.000000
#> 23: PT001_MNC_BM_0060 PT001_MNC_PB_0030      7  <data.table[7x3]>      114       28         135  6.140351 25.000000
#> 24: PT001_MNC_BM_0030 PT002_MNC_BM_0060      1  <data.table[1x3]>       54       33          86  1.851852  3.030303
#> 25: PT001_MNC_BM_0060 PT002_MNC_BM_0060      0  <data.table[0x3]>      114       33         147  0.000000  0.000000
#> 26: PT001_MNC_PB_0060 PT002_MNC_BM_0060      0  <data.table[0x3]>       59       33          92  0.000000  0.000000
#> 27: PT002_MNC_BM_0030 PT002_MNC_BM_0060      5  <data.table[5x3]>       98       33         126  5.102041 15.151515
#> 28: PT001_MNC_PB_0030 PT002_MNC_BM_0060      0  <data.table[0x3]>       28       33          61  0.000000  0.000000
#>                    g1                g2 shared           is_coord count_g1 count_g2 count_union     on_g1     on_g2
#>       on_union
#>  1: 14.2857143
#>  2:  7.6190476
#>  3: 20.1388889
#>  4: 12.9870130
#>  5:  0.0000000
#>  6:  0.4739336
#>  7:  0.6410256
#>  8:  0.0000000
#>  9:  0.0000000
#> 10:  0.0000000
#> 11:  0.0000000
#> 12:  7.4074074
#> 13: 26.9230769
#> 14:  0.0000000
#> 15: 10.8695652
#> 16:  0.0000000
#> 17:  0.7812500
#> 18:  0.0000000
#> 19:  2.7272727
#> 20:  0.0000000
#> 21:  4.3478261
#> 22:  9.3333333
#> 23:  5.1851852
#> 24:  1.1627907
#> 25:  0.0000000
#> 26:  0.0000000
#> 27:  3.9682540
#> 28:  0.0000000
#>       on_union

In this configuration we set:

  • A single input data frame: agg
  • A single grouping key by setting the argument grouping_key. In this specific case, our groups will be identified by a unique combination of SubjectID, CellMarker, Tissue and TimePoint
  • n_comp represents the number of comparisons to compute: 2 means we’re interested in knowing the sharing for PAIRS of distinct groups
  • We want to keep the counts of distinct integration sites for each group by setting is_count to TRUE
  • relative_is_sharing if set to TRUE adds sharing expressed as a percentage, more precisely it adds a column on_g1 that is calculated as the absolute number of shared integrations divided by the cardinality of the first group, on_g2 is analogous but is computed on the cardinality of the second group and finally on_union is computed on the cardinality of the union of the two groups.
  • By setting the argument minimal to TRUE we tell the function to avoid redundant comparisons: in this way only combinations and not permutations are included in the output table
  • include_self_comp adds rows in the table that are labelled with the same group: these rows always have a 100% sharing with all other groups. There are few scenarios where this is useful, but for now we set it to FALSE since we don’t need it
  • keep_genomic_coord allows us to keep the genomic coordinates of the shared integration sites as a separate table

4.1.1 Changing the number of comparisons

sharing_1_a <- is_sharing(agg, 
                        group_key = c("SubjectID", "CellMarker", 
                                      "Tissue", "TimePoint"),
                        n_comp = 3,
                        is_count = TRUE,
                        relative_is_sharing = TRUE,
                        minimal = TRUE,
                        include_self_comp = FALSE, 
                        keep_genomic_coord = TRUE)
#> Calculating combinations...
#> Done!
sharing_1_a
#>                    g1                g2                g3 shared          is_coord count_g1 count_g2 count_g3
#>  1: PT001_MNC_BM_0030 PT001_MNC_BM_0060 PT001_MNC_PB_0060      6 <data.table[6x3]>       54      114       59
#>  2: PT001_MNC_BM_0030 PT001_MNC_PB_0030 PT001_MNC_PB_0060      1 <data.table[1x3]>       54       28       59
#>  3: PT001_MNC_BM_0060 PT001_MNC_PB_0030 PT001_MNC_PB_0060      2 <data.table[2x3]>      114       28       59
#>  4: PT001_MNC_BM_0030 PT001_MNC_BM_0060 PT002_MNC_BM_0030      0 <data.table[0x3]>       54      114       98
#>  5: PT001_MNC_BM_0030 PT001_MNC_PB_0060 PT002_MNC_BM_0030      0 <data.table[0x3]>       54       59       98
#>  6: PT001_MNC_BM_0060 PT001_MNC_PB_0060 PT002_MNC_BM_0030      1 <data.table[1x3]>      114       59       98
#>  7: PT001_MNC_PB_0030 PT001_MNC_PB_0060 PT002_MNC_BM_0030      0 <data.table[0x3]>       28       59       98
#>  8: PT001_MNC_BM_0030 PT001_MNC_PB_0030 PT002_MNC_BM_0030      0 <data.table[0x3]>       54       28       98
#>  9: PT001_MNC_BM_0060 PT001_MNC_PB_0030 PT002_MNC_BM_0030      0 <data.table[0x3]>      114       28       98
#> 10: PT001_MNC_BM_0030 PT001_MNC_BM_0060 PT002_MNC_PB_0060      0 <data.table[0x3]>       54      114       18
#> 11: PT001_MNC_BM_0030 PT001_MNC_PB_0060 PT002_MNC_PB_0060      0 <data.table[0x3]>       54       59       18
#> 12: PT001_MNC_BM_0060 PT001_MNC_PB_0060 PT002_MNC_PB_0060      0 <data.table[0x3]>      114       59       18
#> 13: PT001_MNC_PB_0030 PT001_MNC_PB_0060 PT002_MNC_PB_0060      0 <data.table[0x3]>       28       59       18
#> 14: PT001_MNC_BM_0030 PT002_MNC_BM_0030 PT002_MNC_PB_0060      0 <data.table[0x3]>       54       98       18
#> 15: PT001_MNC_BM_0060 PT002_MNC_BM_0030 PT002_MNC_PB_0060      0 <data.table[0x3]>      114       98       18
#> 16: PT001_MNC_PB_0060 PT002_MNC_BM_0030 PT002_MNC_PB_0060      0 <data.table[0x3]>       59       98       18
#> 17: PT001_MNC_PB_0030 PT002_MNC_BM_0030 PT002_MNC_PB_0060      0 <data.table[0x3]>       28       98       18
#> 18: PT001_MNC_BM_0030 PT002_MNC_PB_0030 PT002_MNC_PB_0060      0 <data.table[0x3]>       54       15       18
#> 19: PT001_MNC_BM_0060 PT002_MNC_PB_0030 PT002_MNC_PB_0060      0 <data.table[0x3]>      114       15       18
#> 20: PT001_MNC_PB_0060 PT002_MNC_PB_0030 PT002_MNC_PB_0060      0 <data.table[0x3]>       59       15       18
#> 21: PT002_MNC_BM_0030 PT002_MNC_PB_0030 PT002_MNC_PB_0060      1 <data.table[1x3]>       98       15       18
#> 22: PT001_MNC_PB_0030 PT002_MNC_PB_0030 PT002_MNC_PB_0060      0 <data.table[0x3]>       28       15       18
#> 23: PT002_MNC_BM_0060 PT002_MNC_PB_0030 PT002_MNC_PB_0060      1 <data.table[1x3]>       33       15       18
#> 24: PT001_MNC_BM_0030 PT001_MNC_PB_0030 PT002_MNC_PB_0060      0 <data.table[0x3]>       54       28       18
#> 25: PT001_MNC_BM_0060 PT001_MNC_PB_0030 PT002_MNC_PB_0060      0 <data.table[0x3]>      114       28       18
#> 26: PT001_MNC_BM_0030 PT002_MNC_BM_0060 PT002_MNC_PB_0060      0 <data.table[0x3]>       54       33       18
#> 27: PT001_MNC_BM_0060 PT002_MNC_BM_0060 PT002_MNC_PB_0060      0 <data.table[0x3]>      114       33       18
#> 28: PT001_MNC_PB_0060 PT002_MNC_BM_0060 PT002_MNC_PB_0060      0 <data.table[0x3]>       59       33       18
#> 29: PT002_MNC_BM_0030 PT002_MNC_BM_0060 PT002_MNC_PB_0060      1 <data.table[1x3]>       98       33       18
#> 30: PT001_MNC_PB_0030 PT002_MNC_BM_0060 PT002_MNC_PB_0060      0 <data.table[0x3]>       28       33       18
#> 31: PT001_MNC_BM_0030 PT001_MNC_BM_0060 PT002_MNC_PB_0030      0 <data.table[0x3]>       54      114       15
#> 32: PT001_MNC_BM_0030 PT001_MNC_PB_0060 PT002_MNC_PB_0030      0 <data.table[0x3]>       54       59       15
#> 33: PT001_MNC_BM_0060 PT001_MNC_PB_0060 PT002_MNC_PB_0030      0 <data.table[0x3]>      114       59       15
#> 34: PT001_MNC_PB_0030 PT001_MNC_PB_0060 PT002_MNC_PB_0030      0 <data.table[0x3]>       28       59       15
#> 35: PT001_MNC_BM_0030 PT002_MNC_BM_0030 PT002_MNC_PB_0030      0 <data.table[0x3]>       54       98       15
#> 36: PT001_MNC_BM_0060 PT002_MNC_BM_0030 PT002_MNC_PB_0030      0 <data.table[0x3]>      114       98       15
#> 37: PT001_MNC_PB_0060 PT002_MNC_BM_0030 PT002_MNC_PB_0030      0 <data.table[0x3]>       59       98       15
#> 38: PT001_MNC_PB_0030 PT002_MNC_BM_0030 PT002_MNC_PB_0030      0 <data.table[0x3]>       28       98       15
#> 39: PT001_MNC_BM_0030 PT001_MNC_PB_0030 PT002_MNC_PB_0030      0 <data.table[0x3]>       54       28       15
#> 40: PT001_MNC_BM_0060 PT001_MNC_PB_0030 PT002_MNC_PB_0030      0 <data.table[0x3]>      114       28       15
#> 41: PT001_MNC_BM_0030 PT002_MNC_BM_0060 PT002_MNC_PB_0030      0 <data.table[0x3]>       54       33       15
#> 42: PT001_MNC_BM_0060 PT002_MNC_BM_0060 PT002_MNC_PB_0030      0 <data.table[0x3]>      114       33       15
#> 43: PT001_MNC_PB_0060 PT002_MNC_BM_0060 PT002_MNC_PB_0030      0 <data.table[0x3]>       59       33       15
#> 44: PT002_MNC_BM_0030 PT002_MNC_BM_0060 PT002_MNC_PB_0030      1 <data.table[1x3]>       98       33       15
#> 45: PT001_MNC_PB_0030 PT002_MNC_BM_0060 PT002_MNC_PB_0030      0 <data.table[0x3]>       28       33       15
#> 46: PT001_MNC_BM_0030 PT001_MNC_BM_0060 PT001_MNC_PB_0030      5 <data.table[5x3]>       54      114       28
#> 47: PT001_MNC_BM_0030 PT001_MNC_BM_0060 PT002_MNC_BM_0060      0 <data.table[0x3]>       54      114       33
#> 48: PT001_MNC_BM_0030 PT001_MNC_PB_0060 PT002_MNC_BM_0060      0 <data.table[0x3]>       54       59       33
#> 49: PT001_MNC_BM_0060 PT001_MNC_PB_0060 PT002_MNC_BM_0060      0 <data.table[0x3]>      114       59       33
#> 50: PT001_MNC_PB_0030 PT001_MNC_PB_0060 PT002_MNC_BM_0060      0 <data.table[0x3]>       28       59       33
#> 51: PT001_MNC_BM_0030 PT002_MNC_BM_0030 PT002_MNC_BM_0060      0 <data.table[0x3]>       54       98       33
#> 52: PT001_MNC_BM_0060 PT002_MNC_BM_0030 PT002_MNC_BM_0060      0 <data.table[0x3]>      114       98       33
#> 53: PT001_MNC_PB_0060 PT002_MNC_BM_0030 PT002_MNC_BM_0060      0 <data.table[0x3]>       59       98       33
#> 54: PT001_MNC_PB_0030 PT002_MNC_BM_0030 PT002_MNC_BM_0060      0 <data.table[0x3]>       28       98       33
#> 55: PT001_MNC_BM_0030 PT001_MNC_PB_0030 PT002_MNC_BM_0060      0 <data.table[0x3]>       54       28       33
#> 56: PT001_MNC_BM_0060 PT001_MNC_PB_0030 PT002_MNC_BM_0060      0 <data.table[0x3]>      114       28       33
#>                    g1                g2                g3 shared          is_coord count_g1 count_g2 count_g3
#>     count_union     on_g1    on_g2     on_g3  on_union
#>  1:         175 11.111111 5.263158 10.169492 3.4285714
#>  2:         117  1.851852 3.571429  1.694915 0.8547009
#>  3:         157  1.754386 7.142857  3.389831 1.2738854
#>  4:         244  0.000000 0.000000  0.000000 0.0000000
#>  5:         202  0.000000 0.000000  0.000000 0.0000000
#>  6:         241  0.877193 1.694915  1.020408 0.4149378
#>  7:         174  0.000000 0.000000  0.000000 0.0000000
#>  8:         173  0.000000 0.000000  0.000000 0.0000000
#>  9:         232  0.000000 0.000000  0.000000 0.0000000
#> 10:         165  0.000000 0.000000  0.000000 0.0000000
#> 11:         123  0.000000 0.000000  0.000000 0.0000000
#> 12:         162  0.000000 0.000000  0.000000 0.0000000
#> 13:          95  0.000000 0.000000  0.000000 0.0000000
#> 14:         162  0.000000 0.000000  0.000000 0.0000000
#> 15:         221  0.000000 0.000000  0.000000 0.0000000
#> 16:         166  0.000000 0.000000  0.000000 0.0000000
#> 17:         136  0.000000 0.000000  0.000000 0.0000000
#> 18:          80  0.000000 0.000000  0.000000 0.0000000
#> 19:         139  0.000000 0.000000  0.000000 0.0000000
#> 20:          85  0.000000 0.000000  0.000000 0.0000000
#> 21:         114  1.020408 6.666667  5.555556 0.8771930
#> 22:          54  0.000000 0.000000  0.000000 0.0000000
#> 23:          53  3.030303 6.666667  5.555556 1.8867925
#> 24:          93  0.000000 0.000000  0.000000 0.0000000
#> 25:         153  0.000000 0.000000  0.000000 0.0000000
#> 26:          99  0.000000 0.000000  0.000000 0.0000000
#> 27:         160  0.000000 0.000000  0.000000 0.0000000
#> 28:         105  0.000000 0.000000  0.000000 0.0000000
#> 29:         132  1.020408 3.030303  5.555556 0.7575758
#> 30:          74  0.000000 0.000000  0.000000 0.0000000
#> 31:         161  0.000000 0.000000  0.000000 0.0000000
#> 32:         120  0.000000 0.000000  0.000000 0.0000000
#> 33:         158  0.000000 0.000000  0.000000 0.0000000
#> 34:          92  0.000000 0.000000  0.000000 0.0000000
#> 35:         164  0.000000 0.000000  0.000000 0.0000000
#> 36:         222  0.000000 0.000000  0.000000 0.0000000
#> 37:         168  0.000000 0.000000  0.000000 0.0000000
#> 38:         138  0.000000 0.000000  0.000000 0.0000000
#> 39:          90  0.000000 0.000000  0.000000 0.0000000
#> 40:         149  0.000000 0.000000  0.000000 0.0000000
#> 41:          99  0.000000 0.000000  0.000000 0.0000000
#> 42:         159  0.000000 0.000000  0.000000 0.0000000
#> 43:         105  0.000000 0.000000  0.000000 0.0000000
#> 44:         137  1.020408 3.030303  6.666667 0.7299270
#> 45:          74  0.000000 0.000000  0.000000 0.0000000
#> 46:         166  9.259259 4.385965 17.857143 3.0120482
#> 47:         179  0.000000 0.000000  0.000000 0.0000000
#> 48:         137  0.000000 0.000000  0.000000 0.0000000
#> 49:         177  0.000000 0.000000  0.000000 0.0000000
#> 50:         110  0.000000 0.000000  0.000000 0.0000000
#> 51:         179  0.000000 0.000000  0.000000 0.0000000
#> 52:         239  0.000000 0.000000  0.000000 0.0000000
#> 53:         184  0.000000 0.000000  0.000000 0.0000000
#> 54:         154  0.000000 0.000000  0.000000 0.0000000
#> 55:         107  0.000000 0.000000  0.000000 0.0000000
#> 56:         168  0.000000 0.000000  0.000000 0.0000000
#>     count_union     on_g1    on_g2     on_g3  on_union

Changing the n_comp to 3 means that we want to calculate the sharing between 3 different groups. Note that the shared column contains the counts of integrations that are shared by ALL groups, which is equivalent to a set intersection.

Beware of the fact that the more comparisons are requested the more time the computation requires.

4.1.2 A case when it is useful to set minimal = FALSE

Setting minimal = FALSE produces all possible permutations of the groups and the corresponding values. In combination with include_self_comp = TRUE, this is useful when we want to know the sharing between pairs of groups and plot results as a heatmap.

sharing_1_b <- is_sharing(agg,
                          group_key = c("SubjectID", "CellMarker", 
                                      "Tissue", "TimePoint"),
                          n_comp = 2,
                          is_count = TRUE,
                          relative_is_sharing = TRUE,
                          minimal = FALSE,
                          include_self_comp = TRUE)
#> Calculating combinations...
#> Calculating self groups (requested)...
#> Calculating permutations (requested)...
#> Done!
sharing_1_b
#>                    g1                g2 shared count_g1 count_g2 count_union      on_g1      on_g2    on_union
#>  1: PT001_MNC_BM_0030 PT001_MNC_BM_0030     54       54       54          54 100.000000 100.000000 100.0000000
#>  2: PT001_MNC_BM_0030 PT001_MNC_BM_0060     21       54      114         147  38.888889  18.421053  14.2857143
#>  3: PT001_MNC_BM_0060 PT001_MNC_BM_0030     21      114       54         147  18.421053  38.888889  14.2857143
#>  4: PT001_MNC_BM_0060 PT001_MNC_BM_0060    114      114      114         114 100.000000 100.000000 100.0000000
#>  5: PT001_MNC_BM_0030 PT001_MNC_PB_0060      8       54       59         105  14.814815  13.559322   7.6190476
#>  6: PT001_MNC_PB_0060 PT001_MNC_BM_0030      8       59       54         105  13.559322  14.814815   7.6190476
#>  7: PT001_MNC_BM_0060 PT001_MNC_PB_0060     29      114       59         144  25.438596  49.152542  20.1388889
#>  8: PT001_MNC_PB_0060 PT001_MNC_BM_0060     29       59      114         144  49.152542  25.438596  20.1388889
#>  9: PT001_MNC_PB_0060 PT001_MNC_PB_0060     59       59       59          59 100.000000 100.000000 100.0000000
#> 10: PT001_MNC_PB_0030 PT001_MNC_PB_0060     10       28       59          77  35.714286  16.949153  12.9870130
#> 11: PT001_MNC_PB_0060 PT001_MNC_PB_0030     10       59       28          77  16.949153  35.714286  12.9870130
#> 12: PT001_MNC_BM_0030 PT002_MNC_BM_0030      0       54       98         152   0.000000   0.000000   0.0000000
#> 13: PT002_MNC_BM_0030 PT001_MNC_BM_0030      0       98       54         152   0.000000   0.000000   0.0000000
#> 14: PT001_MNC_BM_0060 PT002_MNC_BM_0030      1      114       98         211   0.877193   1.020408   0.4739336
#> 15: PT002_MNC_BM_0030 PT001_MNC_BM_0060      1       98      114         211   1.020408   0.877193   0.4739336
#> 16: PT001_MNC_PB_0060 PT002_MNC_BM_0030      1       59       98         156   1.694915   1.020408   0.6410256
#> 17: PT002_MNC_BM_0030 PT001_MNC_PB_0060      1       98       59         156   1.020408   1.694915   0.6410256
#> 18: PT002_MNC_BM_0030 PT002_MNC_BM_0030     98       98       98          98 100.000000 100.000000 100.0000000
#> 19: PT001_MNC_PB_0030 PT002_MNC_BM_0030      0       28       98         126   0.000000   0.000000   0.0000000
#> 20: PT002_MNC_BM_0030 PT001_MNC_PB_0030      0       98       28         126   0.000000   0.000000   0.0000000
#> 21: PT001_MNC_BM_0030 PT002_MNC_PB_0060      0       54       18          72   0.000000   0.000000   0.0000000
#> 22: PT002_MNC_PB_0060 PT001_MNC_BM_0030      0       18       54          72   0.000000   0.000000   0.0000000
#> 23: PT001_MNC_BM_0060 PT002_MNC_PB_0060      0      114       18         132   0.000000   0.000000   0.0000000
#> 24: PT002_MNC_PB_0060 PT001_MNC_BM_0060      0       18      114         132   0.000000   0.000000   0.0000000
#> 25: PT001_MNC_PB_0060 PT002_MNC_PB_0060      0       59       18          77   0.000000   0.000000   0.0000000
#> 26: PT002_MNC_PB_0060 PT001_MNC_PB_0060      0       18       59          77   0.000000   0.000000   0.0000000
#> 27: PT002_MNC_BM_0030 PT002_MNC_PB_0060      8       98       18         108   8.163265  44.444444   7.4074074
#> 28: PT002_MNC_PB_0060 PT002_MNC_BM_0030      8       18       98         108  44.444444   8.163265   7.4074074
#> 29: PT002_MNC_PB_0060 PT002_MNC_PB_0060     18       18       18          18 100.000000 100.000000 100.0000000
#> 30: PT002_MNC_PB_0030 PT002_MNC_PB_0060      7       15       18          26  46.666667  38.888889  26.9230769
#> 31: PT002_MNC_PB_0060 PT002_MNC_PB_0030      7       18       15          26  38.888889  46.666667  26.9230769
#> 32: PT001_MNC_PB_0030 PT002_MNC_PB_0060      0       28       18          46   0.000000   0.000000   0.0000000
#> 33: PT002_MNC_PB_0060 PT001_MNC_PB_0030      0       18       28          46   0.000000   0.000000   0.0000000
#> 34: PT002_MNC_BM_0060 PT002_MNC_PB_0060      5       33       18          46  15.151515  27.777778  10.8695652
#> 35: PT002_MNC_PB_0060 PT002_MNC_BM_0060      5       18       33          46  27.777778  15.151515  10.8695652
#> 36: PT001_MNC_BM_0030 PT002_MNC_PB_0030      0       54       15          69   0.000000   0.000000   0.0000000
#> 37: PT002_MNC_PB_0030 PT001_MNC_BM_0030      0       15       54          69   0.000000   0.000000   0.0000000
#> 38: PT001_MNC_BM_0060 PT002_MNC_PB_0030      1      114       15         128   0.877193   6.666667   0.7812500
#> 39: PT002_MNC_PB_0030 PT001_MNC_BM_0060      1       15      114         128   6.666667   0.877193   0.7812500
#> 40: PT001_MNC_PB_0060 PT002_MNC_PB_0030      0       59       15          74   0.000000   0.000000   0.0000000
#> 41: PT002_MNC_PB_0030 PT001_MNC_PB_0060      0       15       59          74   0.000000   0.000000   0.0000000
#> 42: PT002_MNC_BM_0030 PT002_MNC_PB_0030      3       98       15         110   3.061224  20.000000   2.7272727
#> 43: PT002_MNC_PB_0030 PT002_MNC_BM_0030      3       15       98         110  20.000000   3.061224   2.7272727
#> 44: PT002_MNC_PB_0030 PT002_MNC_PB_0030     15       15       15          15 100.000000 100.000000 100.0000000
#> 45: PT001_MNC_PB_0030 PT002_MNC_PB_0030      0       28       15          43   0.000000   0.000000   0.0000000
#> 46: PT002_MNC_PB_0030 PT001_MNC_PB_0030      0       15       28          43   0.000000   0.000000   0.0000000
#> 47: PT002_MNC_BM_0060 PT002_MNC_PB_0030      2       33       15          46   6.060606  13.333333   4.3478261
#> 48: PT002_MNC_PB_0030 PT002_MNC_BM_0060      2       15       33          46  13.333333   6.060606   4.3478261
#> 49: PT001_MNC_BM_0030 PT001_MNC_PB_0030      7       54       28          75  12.962963  25.000000   9.3333333
#> 50: PT001_MNC_PB_0030 PT001_MNC_BM_0030      7       28       54          75  25.000000  12.962963   9.3333333
#> 51: PT001_MNC_BM_0060 PT001_MNC_PB_0030      7      114       28         135   6.140351  25.000000   5.1851852
#> 52: PT001_MNC_PB_0030 PT001_MNC_BM_0060      7       28      114         135  25.000000   6.140351   5.1851852
#> 53: PT001_MNC_PB_0030 PT001_MNC_PB_0030     28       28       28          28 100.000000 100.000000 100.0000000
#> 54: PT001_MNC_BM_0030 PT002_MNC_BM_0060      1       54       33          86   1.851852   3.030303   1.1627907
#> 55: PT002_MNC_BM_0060 PT001_MNC_BM_0030      1       33       54          86   3.030303   1.851852   1.1627907
#> 56: PT001_MNC_BM_0060 PT002_MNC_BM_0060      0      114       33         147   0.000000   0.000000   0.0000000
#> 57: PT002_MNC_BM_0060 PT001_MNC_BM_0060      0       33      114         147   0.000000   0.000000   0.0000000
#> 58: PT001_MNC_PB_0060 PT002_MNC_BM_0060      0       59       33          92   0.000000   0.000000   0.0000000
#> 59: PT002_MNC_BM_0060 PT001_MNC_PB_0060      0       33       59          92   0.000000   0.000000   0.0000000
#> 60: PT002_MNC_BM_0030 PT002_MNC_BM_0060      5       98       33         126   5.102041  15.151515   3.9682540
#> 61: PT002_MNC_BM_0060 PT002_MNC_BM_0030      5       33       98         126  15.151515   5.102041   3.9682540
#> 62: PT001_MNC_PB_0030 PT002_MNC_BM_0060      0       28       33          61   0.000000   0.000000   0.0000000
#> 63: PT002_MNC_BM_0060 PT001_MNC_PB_0030      0       33       28          61   0.000000   0.000000   0.0000000
#> 64: PT002_MNC_BM_0060 PT002_MNC_BM_0060     33       33       33          33 100.000000 100.000000 100.0000000
#>                    g1                g2 shared count_g1 count_g2 count_union      on_g1      on_g2    on_union
heatmaps <- sharing_heatmap(sharing_1_b)

The function sharing_heatmap() automatically plots sharing between 2 groups. There are several arguments to this function that allow us to obtain heatmaps for the absolute sharing values or the relative (percentage) values.

heatmaps$absolute

heatmaps$on_g1

heatmaps$on_union

Beware of the fact that calculating all permutations takes longer than calculating only distinct combinations, therefore if you don’t need your results plotted or you have more than 2 groups to compare you should stick with minimal = TRUE and include_self_comp = FALSE.

4.2 SCENARIO 2: single input data frame and multiple grouping keys

In the first scenario, groups were homogeneous, that is, they were grouped all with the same key. In this other scenario we want to start with data contained in just one data frame but we want to compare sets of integrations that are grouped differently. To do this we give as input a list of keys through the argument group_keys.

sharing_2 <- is_sharing(agg,
                        group_keys = list(
                          g1 = c("SubjectID", "CellMarker", 
                                 "Tissue", "TimePoint"),
                          g2 = c("SubjectID", "CellMarker"),
                          g3 = c("CellMarker", "Tissue")
                          ))
#> Done!
sharing_2
#>                    g1        g2     g3 shared count_g1 count_g2 count_g3 count_union      on_g1      on_g2      on_g3
#>  1: PT001_MNC_BM_0030 PT001_MNC MNC_BM     54       54      186      271         310 100.000000 29.0322581 19.9261993
#>  2: PT001_MNC_BM_0060 PT001_MNC MNC_BM    114      114      186      271         310 100.000000 61.2903226 42.0664207
#>  3: PT001_MNC_PB_0060 PT001_MNC MNC_BM     31       59      186      271         310  52.542373 16.6666667 11.4391144
#>  4: PT002_MNC_BM_0030 PT001_MNC MNC_BM      1       98      186      271         310   1.020408  0.5376344  0.3690037
#>  5: PT002_MNC_PB_0060 PT001_MNC MNC_BM      0       18      186      271         316   0.000000  0.0000000  0.0000000
#>  6: PT002_MNC_PB_0030 PT001_MNC MNC_BM      1       15      186      271         320   6.666667  0.5376344  0.3690037
#>  7: PT001_MNC_PB_0030 PT001_MNC MNC_BM      9       28      186      271         310  32.142857  4.8387097  3.3210332
#>  8: PT002_MNC_BM_0060 PT001_MNC MNC_BM      1       33      186      271         310   3.030303  0.5376344  0.3690037
#>  9: PT001_MNC_BM_0030 PT002_MNC MNC_BM      1       54      137      271         281   1.851852  0.7299270  0.3690037
#> 10: PT001_MNC_BM_0060 PT002_MNC MNC_BM      2      114      137      271         281   1.754386  1.4598540  0.7380074
#> 11: PT001_MNC_PB_0060 PT002_MNC MNC_BM      1       59      137      271         309   1.694915  0.7299270  0.3690037
#> 12: PT002_MNC_BM_0030 PT002_MNC MNC_BM     98       98      137      271         281 100.000000 71.5328467 36.1623616
#> 13: PT002_MNC_PB_0060 PT002_MNC MNC_BM     12       18      137      271         281  66.666667  8.7591241  4.4280443
#> 14: PT002_MNC_PB_0030 PT002_MNC MNC_BM      5       15      137      271         281  33.333333  3.6496350  1.8450185
#> 15: PT001_MNC_PB_0030 PT002_MNC MNC_BM      0       28      137      271         300   0.000000  0.0000000  0.0000000
#> 16: PT002_MNC_BM_0060 PT002_MNC MNC_BM     33       33      137      271         281 100.000000 24.0875912 12.1771218
#> 17: PT001_MNC_BM_0030 PT001_MNC MNC_PB     14       54      186      103         211  25.925926  7.5268817 13.5922330
#> 18: PT001_MNC_BM_0060 PT001_MNC MNC_PB     35      114      186      103         211  30.701754 18.8172043 33.9805825
#> 19: PT001_MNC_PB_0060 PT001_MNC MNC_PB     59       59      186      103         211 100.000000 31.7204301 57.2815534
#> 20: PT002_MNC_BM_0030 PT001_MNC MNC_PB      1       98      186      103         298   1.020408  0.5376344  0.9708738
#> 21: PT002_MNC_PB_0060 PT001_MNC MNC_PB      0       18      186      103         211   0.000000  0.0000000  0.0000000
#> 22: PT002_MNC_PB_0030 PT001_MNC MNC_PB      1       15      186      103         211   6.666667  0.5376344  0.9708738
#> 23: PT001_MNC_PB_0030 PT001_MNC MNC_PB     28       28      186      103         211 100.000000 15.0537634 27.1844660
#> 24: PT002_MNC_BM_0060 PT001_MNC MNC_PB      0       33      186      103         237   0.000000  0.0000000  0.0000000
#> 25: PT001_MNC_BM_0030 PT002_MNC MNC_PB      0       54      137      103         252   0.000000  0.0000000  0.0000000
#> 26: PT001_MNC_BM_0060 PT002_MNC MNC_PB      2      114      137      103         292   1.754386  1.4598540  1.9417476
#> 27: PT001_MNC_PB_0060 PT002_MNC MNC_PB      1       59      137      103         213   1.694915  0.7299270  0.9708738
#> 28: PT002_MNC_BM_0030 PT002_MNC MNC_PB     11       98      137      103         213  11.224490  8.0291971 10.6796117
#> 29: PT002_MNC_PB_0060 PT002_MNC MNC_PB     18       18      137      103         213 100.000000 13.1386861 17.4757282
#> 30: PT002_MNC_PB_0030 PT002_MNC MNC_PB     15       15      137      103         213 100.000000 10.9489051 14.5631068
#> 31: PT001_MNC_PB_0030 PT002_MNC MNC_PB      0       28      137      103         213   0.000000  0.0000000  0.0000000
#> 32: PT002_MNC_BM_0060 PT002_MNC MNC_PB      6       33      137      103         213  18.181818  4.3795620  5.8252427
#>                    g1        g2     g3 shared count_g1 count_g2 count_g3 count_union      on_g1      on_g2      on_g3
#>       on_union
#>  1: 17.4193548
#>  2: 36.7741935
#>  3: 10.0000000
#>  4:  0.3225806
#>  5:  0.0000000
#>  6:  0.3125000
#>  7:  2.9032258
#>  8:  0.3225806
#>  9:  0.3558719
#> 10:  0.7117438
#> 11:  0.3236246
#> 12: 34.8754448
#> 13:  4.2704626
#> 14:  1.7793594
#> 15:  0.0000000
#> 16: 11.7437722
#> 17:  6.6350711
#> 18: 16.5876777
#> 19: 27.9620853
#> 20:  0.3355705
#> 21:  0.0000000
#> 22:  0.4739336
#> 23: 13.2701422
#> 24:  0.0000000
#> 25:  0.0000000
#> 26:  0.6849315
#> 27:  0.4694836
#> 28:  5.1643192
#> 29:  8.4507042
#> 30:  7.0422535
#> 31:  0.0000000
#> 32:  2.8169014
#>       on_union

There are a few things to keep in mind in this case:

  • The arguments group_key (notice the absence of plural), n_comp and include_self_comp are ignored: the number of comparisons is automatically detected by counting the provided keys and a self comparison doesn’t make sense since group labels are different
  • If you provide a list of identical keys or just one key you fall back to scenario 1

4.3 SCENARIO 3: multiple input data frame and single grouping key

Providing multiple input data frames and the same grouping key is an effective way to reduce the number of comparisons performed. Let’s make an example: suppose we’re interested in comparing groups labelled by a unique combination of SubjectID, CellMarker, Tissue and TimePoint, but this time we want the first group to contain only integrations relative to PT001_MNC_BM_0030 and the second group to contain only integrations relative to PT001_MNC_BM_0060.

We’re going to filter the original data frame in order to obtain only relevant data in 2 separated tables and then proceed by calling the function.

first_sample <- agg %>%
  dplyr::filter(SubjectID == "PT001", CellMarker == "MNC", Tissue == "BM", 
         TimePoint == "0030")
second_sample <- agg %>%
  dplyr::filter(SubjectID == "PT001", CellMarker == "MNC", Tissue == "BM", 
         TimePoint == "0060")
sharing_3 <- is_sharing(first_sample, second_sample,
                        group_key = c("SubjectID", "CellMarker", 
                                      "Tissue", "TimePoint"),
                        is_count = TRUE,
                        relative_is_sharing = TRUE,
                        minimal = TRUE)
#> Done!
sharing_3
#>                   g1                g2 shared count_g1 count_g2 count_union    on_g1    on_g2 on_union
#> 1: PT001_MNC_BM_0030 PT001_MNC_BM_0060     21       54      114         147 38.88889 18.42105 14.28571

Once again the arguments n_comp and include_self_comp are ignored: the number of comparisons is equal to the number of data frames in input.

4.4 SCENARIO 4: multiple input data frame and multiple grouping keys

Finally, the most general scenario is when we have multiple data frames with multiple keys. In this case the number of data frames must be equal to the number of provided keys and grouping keys are applied in order ( data frame 1 is grouped with key 1, data frame 2 is grouped with key 2, and so on).

df1 <- agg %>%
  dplyr::filter(TimePoint == "0030")
df2 <- agg %>%
  dplyr::filter(TimePoint == "0060")
df3 <- agg %>%
  dplyr::filter(Tissue == "BM")

keys <- list(g1 = c("SubjectID", "CellMarker", "Tissue"),
             g2 = c("SubjectID", "Tissue"),
             g3 = c("SubjectID", "CellMarker", "Tissue"))

sharing_4 <- is_sharing(df1, df2, df3, group_keys = keys)
#> Done!
sharing_4
#>               g1       g2           g3 shared count_g1 count_g2 count_g3 count_union     on_g1     on_g2      on_g3
#>  1: PT001_MNC_BM PT001_BM PT001_MNC_BM     21       54      114      147         147 38.888889 18.421053 14.2857143
#>  2: PT002_MNC_BM PT001_BM PT001_MNC_BM      0       98      114      147         271  0.000000  0.000000  0.0000000
#>  3: PT002_MNC_PB PT001_BM PT001_MNC_BM      0       15      114      147         161  0.000000  0.000000  0.0000000
#>  4: PT001_MNC_PB PT001_BM PT001_MNC_BM      5       28      114      147         166 17.857143  4.385965  3.4013605
#>  5: PT001_MNC_BM PT001_PB PT001_MNC_BM      8       54       59      147         175 14.814815 13.559322  5.4421769
#>  6: PT002_MNC_BM PT001_PB PT001_MNC_BM      0       98       59      147         299  0.000000  0.000000  0.0000000
#>  7: PT002_MNC_PB PT001_PB PT001_MNC_BM      0       15       59      147         189  0.000000  0.000000  0.0000000
#>  8: PT001_MNC_PB PT001_PB PT001_MNC_BM      1       28       59      147         186  3.571429  1.694915  0.6802721
#>  9: PT001_MNC_BM PT002_PB PT001_MNC_BM      0       54       18      147         165  0.000000  0.000000  0.0000000
#> 10: PT002_MNC_BM PT002_PB PT001_MNC_BM      0       98       18      147         277  0.000000  0.000000  0.0000000
#> 11: PT002_MNC_PB PT002_PB PT001_MNC_BM      0       15       18      147         172  0.000000  0.000000  0.0000000
#> 12: PT001_MNC_PB PT002_PB PT001_MNC_BM      0       28       18      147         184  0.000000  0.000000  0.0000000
#> 13: PT001_MNC_BM PT002_BM PT001_MNC_BM      1       54       33      147         179  1.851852  3.030303  0.6802721
#> 14: PT002_MNC_BM PT002_BM PT001_MNC_BM      0       98       33      147         271  0.000000  0.000000  0.0000000
#> 15: PT002_MNC_PB PT002_BM PT001_MNC_BM      0       15       33      147         191  0.000000  0.000000  0.0000000
#> 16: PT001_MNC_PB PT002_BM PT001_MNC_BM      0       28       33      147         198  0.000000  0.000000  0.0000000
#> 17: PT001_MNC_BM PT001_BM PT002_MNC_BM      0       54      114      126         271  0.000000  0.000000  0.0000000
#> 18: PT002_MNC_BM PT001_BM PT002_MNC_BM      1       98      114      126         239  1.020408  0.877193  0.7936508
#> 19: PT002_MNC_PB PT001_BM PT002_MNC_BM      0       15      114      126         249  0.000000  0.000000  0.0000000
#> 20: PT001_MNC_PB PT001_BM PT002_MNC_BM      0       28      114      126         260  0.000000  0.000000  0.0000000
#> 21: PT001_MNC_BM PT001_PB PT002_MNC_BM      0       54       59      126         299  0.000000  0.000000  0.0000000
#> 22: PT002_MNC_BM PT001_PB PT002_MNC_BM      1       98       59      126         184  1.020408  1.694915  0.7936508
#> 23: PT002_MNC_PB PT001_PB PT002_MNC_BM      0       15       59      126         195  0.000000  0.000000  0.0000000
#> 24: PT001_MNC_PB PT001_PB PT002_MNC_BM      0       28       59      126         202  0.000000  0.000000  0.0000000
#> 25: PT001_MNC_BM PT002_PB PT002_MNC_BM      0       54       18      126         277  0.000000  0.000000  0.0000000
#> 26: PT002_MNC_BM PT002_PB PT002_MNC_BM      8       98       18      126         132  8.163265 44.444444  6.3492063
#> 27: PT002_MNC_PB PT002_PB PT002_MNC_BM      1       15       18      126         137  6.666667  5.555556  0.7936508
#> 28: PT001_MNC_PB PT002_PB PT002_MNC_BM      0       28       18      126         160  0.000000  0.000000  0.0000000
#> 29: PT001_MNC_BM PT002_BM PT002_MNC_BM      0       54       33      126         271  0.000000  0.000000  0.0000000
#> 30: PT002_MNC_BM PT002_BM PT002_MNC_BM      5       98       33      126         126  5.102041 15.151515  3.9682540
#> 31: PT002_MNC_PB PT002_BM PT002_MNC_BM      1       15       33      126         137  6.666667  3.030303  0.7936508
#> 32: PT001_MNC_PB PT002_BM PT002_MNC_BM      0       28       33      126         154  0.000000  0.000000  0.0000000
#>               g1       g2           g3 shared count_g1 count_g2 count_g3 count_union     on_g1     on_g2      on_g3
#>       on_union
#>  1: 14.2857143
#>  2:  0.0000000
#>  3:  0.0000000
#>  4:  3.0120482
#>  5:  4.5714286
#>  6:  0.0000000
#>  7:  0.0000000
#>  8:  0.5376344
#>  9:  0.0000000
#> 10:  0.0000000
#> 11:  0.0000000
#> 12:  0.0000000
#> 13:  0.5586592
#> 14:  0.0000000
#> 15:  0.0000000
#> 16:  0.0000000
#> 17:  0.0000000
#> 18:  0.4184100
#> 19:  0.0000000
#> 20:  0.0000000
#> 21:  0.0000000
#> 22:  0.5434783
#> 23:  0.0000000
#> 24:  0.0000000
#> 25:  0.0000000
#> 26:  6.0606061
#> 27:  0.7299270
#> 28:  0.0000000
#> 29:  0.0000000
#> 30:  3.9682540
#> 31:  0.7299270
#> 32:  0.0000000
#>       on_union

Notice that in this example the keys for g1 and g3 are the same, meaning the labels of the groups are actually the same, however you should remember that the groups contain a different set of integration sites since they come from different data frames.

5 Plotting sharing results

When we have more than 2 comparisons it is convenient to plot them as venn or euler diagrams. ISAnalytics has a fast way to do that through the functions is_sharing() and sharing_venn().

sharing_5 <- is_sharing(agg,
                        group_keys = list(
                          g1 = c("SubjectID", "CellMarker", 
                                 "Tissue", "TimePoint"),
                          g2 = c("SubjectID", "CellMarker"),
                          g3 = c("CellMarker", "Tissue")
                          ), table_for_venn = TRUE)
#> Done!
sharing_5
#>                    g1        g2     g3 shared count_g1 count_g2 count_g3 count_union      on_g1      on_g2      on_g3
#>  1: PT001_MNC_BM_0030 PT001_MNC MNC_BM     54       54      186      271         310 100.000000 29.0322581 19.9261993
#>  2: PT001_MNC_BM_0060 PT001_MNC MNC_BM    114      114      186      271         310 100.000000 61.2903226 42.0664207
#>  3: PT001_MNC_PB_0060 PT001_MNC MNC_BM     31       59      186      271         310  52.542373 16.6666667 11.4391144
#>  4: PT002_MNC_BM_0030 PT001_MNC MNC_BM      1       98      186      271         310   1.020408  0.5376344  0.3690037
#>  5: PT002_MNC_PB_0060 PT001_MNC MNC_BM      0       18      186      271         316   0.000000  0.0000000  0.0000000
#>  6: PT002_MNC_PB_0030 PT001_MNC MNC_BM      1       15      186      271         320   6.666667  0.5376344  0.3690037
#>  7: PT001_MNC_PB_0030 PT001_MNC MNC_BM      9       28      186      271         310  32.142857  4.8387097  3.3210332
#>  8: PT002_MNC_BM_0060 PT001_MNC MNC_BM      1       33      186      271         310   3.030303  0.5376344  0.3690037
#>  9: PT001_MNC_BM_0030 PT002_MNC MNC_BM      1       54      137      271         281   1.851852  0.7299270  0.3690037
#> 10: PT001_MNC_BM_0060 PT002_MNC MNC_BM      2      114      137      271         281   1.754386  1.4598540  0.7380074
#> 11: PT001_MNC_PB_0060 PT002_MNC MNC_BM      1       59      137      271         309   1.694915  0.7299270  0.3690037
#> 12: PT002_MNC_BM_0030 PT002_MNC MNC_BM     98       98      137      271         281 100.000000 71.5328467 36.1623616
#> 13: PT002_MNC_PB_0060 PT002_MNC MNC_BM     12       18      137      271         281  66.666667  8.7591241  4.4280443
#> 14: PT002_MNC_PB_0030 PT002_MNC MNC_BM      5       15      137      271         281  33.333333  3.6496350  1.8450185
#> 15: PT001_MNC_PB_0030 PT002_MNC MNC_BM      0       28      137      271         300   0.000000  0.0000000  0.0000000
#> 16: PT002_MNC_BM_0060 PT002_MNC MNC_BM     33       33      137      271         281 100.000000 24.0875912 12.1771218
#> 17: PT001_MNC_BM_0030 PT001_MNC MNC_PB     14       54      186      103         211  25.925926  7.5268817 13.5922330
#> 18: PT001_MNC_BM_0060 PT001_MNC MNC_PB     35      114      186      103         211  30.701754 18.8172043 33.9805825
#> 19: PT001_MNC_PB_0060 PT001_MNC MNC_PB     59       59      186      103         211 100.000000 31.7204301 57.2815534
#> 20: PT002_MNC_BM_0030 PT001_MNC MNC_PB      1       98      186      103         298   1.020408  0.5376344  0.9708738
#> 21: PT002_MNC_PB_0060 PT001_MNC MNC_PB      0       18      186      103         211   0.000000  0.0000000  0.0000000
#> 22: PT002_MNC_PB_0030 PT001_MNC MNC_PB      1       15      186      103         211   6.666667  0.5376344  0.9708738
#> 23: PT001_MNC_PB_0030 PT001_MNC MNC_PB     28       28      186      103         211 100.000000 15.0537634 27.1844660
#> 24: PT002_MNC_BM_0060 PT001_MNC MNC_PB      0       33      186      103         237   0.000000  0.0000000  0.0000000
#> 25: PT001_MNC_BM_0030 PT002_MNC MNC_PB      0       54      137      103         252   0.000000  0.0000000  0.0000000
#> 26: PT001_MNC_BM_0060 PT002_MNC MNC_PB      2      114      137      103         292   1.754386  1.4598540  1.9417476
#> 27: PT001_MNC_PB_0060 PT002_MNC MNC_PB      1       59      137      103         213   1.694915  0.7299270  0.9708738
#> 28: PT002_MNC_BM_0030 PT002_MNC MNC_PB     11       98      137      103         213  11.224490  8.0291971 10.6796117
#> 29: PT002_MNC_PB_0060 PT002_MNC MNC_PB     18       18      137      103         213 100.000000 13.1386861 17.4757282
#> 30: PT002_MNC_PB_0030 PT002_MNC MNC_PB     15       15      137      103         213 100.000000 10.9489051 14.5631068
#> 31: PT001_MNC_PB_0030 PT002_MNC MNC_PB      0       28      137      103         213   0.000000  0.0000000  0.0000000
#> 32: PT002_MNC_BM_0060 PT002_MNC MNC_PB      6       33      137      103         213  18.181818  4.3795620  5.8252427
#>                    g1        g2     g3 shared count_g1 count_g2 count_g3 count_union      on_g1      on_g2      on_g3
#>       on_union      truth_tbl_venn
#>  1: 17.4193548 <data.table[310x4]>
#>  2: 36.7741935 <data.table[310x4]>
#>  3: 10.0000000 <data.table[310x4]>
#>  4:  0.3225806 <data.table[310x4]>
#>  5:  0.0000000 <data.table[316x4]>
#>  6:  0.3125000 <data.table[320x4]>
#>  7:  2.9032258 <data.table[310x4]>
#>  8:  0.3225806 <data.table[310x4]>
#>  9:  0.3558719 <data.table[281x4]>
#> 10:  0.7117438 <data.table[281x4]>
#> 11:  0.3236246 <data.table[309x4]>
#> 12: 34.8754448 <data.table[281x4]>
#> 13:  4.2704626 <data.table[281x4]>
#> 14:  1.7793594 <data.table[281x4]>
#> 15:  0.0000000 <data.table[300x4]>
#> 16: 11.7437722 <data.table[281x4]>
#> 17:  6.6350711 <data.table[211x4]>
#> 18: 16.5876777 <data.table[211x4]>
#> 19: 27.9620853 <data.table[211x4]>
#> 20:  0.3355705 <data.table[298x4]>
#> 21:  0.0000000 <data.table[211x4]>
#> 22:  0.4739336 <data.table[211x4]>
#> 23: 13.2701422 <data.table[211x4]>
#> 24:  0.0000000 <data.table[237x4]>
#> 25:  0.0000000 <data.table[252x4]>
#> 26:  0.6849315 <data.table[292x4]>
#> 27:  0.4694836 <data.table[213x4]>
#> 28:  5.1643192 <data.table[213x4]>
#> 29:  8.4507042 <data.table[213x4]>
#> 30:  7.0422535 <data.table[213x4]>
#> 31:  0.0000000 <data.table[213x4]>
#> 32:  2.8169014 <data.table[213x4]>
#>       on_union      truth_tbl_venn

The argument table_for_venn = TRUE will add a new column truth_tbl_venn that contains corresponding truth tables for each row.

sharing_plots1 <- sharing_venn(sharing_5, row_range = 1, euler = TRUE)
sharing_plots2 <- sharing_venn(sharing_5, row_range = 1, euler = FALSE)

Say that we’re interested in plotting just the first row of our sharing data frame. Then we can call the function sharing_venn and specify in the row_range argument the index 1. Note that this function requires the package eulerr to work. The argument euler indicates if the function should produce euler or venn diagrams instead.

Once obtained the lists of euler/venn objects we can plot them by simply calling the function plot():

plot(sharing_plots1[[1]])

plot(sharing_plots2[[1]])

There are several options that can be set, for this please refer to eulerr docs.

6 Reproducibility

R session information.

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       Ubuntu 20.04.3 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  C
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2022-01-16
#>  pandoc   2.5 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package      * version date (UTC) lib source
#>  assertthat     0.2.1   2019-03-21 [2] CRAN (R 4.1.2)
#>  BiocManager    1.30.16 2021-06-15 [2] CRAN (R 4.1.2)
#>  BiocParallel   1.28.3  2022-01-16 [2] Bioconductor
#>  BiocStyle    * 2.22.0  2022-01-16 [2] Bioconductor
#>  bit            4.0.4   2020-08-04 [2] CRAN (R 4.1.2)
#>  bit64          4.0.5   2020-08-30 [2] CRAN (R 4.1.2)
#>  bookdown       0.24    2021-09-02 [2] CRAN (R 4.1.2)
#>  bslib          0.3.1   2021-10-06 [2] CRAN (R 4.1.2)
#>  cli            3.1.0   2021-10-27 [2] CRAN (R 4.1.2)
#>  colorspace     2.0-2   2021-06-24 [2] CRAN (R 4.1.2)
#>  crayon         1.4.2   2021-10-29 [2] CRAN (R 4.1.2)
#>  data.table     1.14.2  2021-09-27 [2] CRAN (R 4.1.2)
#>  DBI            1.1.2   2021-12-20 [2] CRAN (R 4.1.2)
#>  digest         0.6.29  2021-12-01 [2] CRAN (R 4.1.2)
#>  dplyr          1.0.7   2021-06-18 [2] CRAN (R 4.1.2)
#>  ellipsis       0.3.2   2021-04-29 [2] CRAN (R 4.1.2)
#>  eulerr         6.1.1   2021-09-06 [2] CRAN (R 4.1.2)
#>  evaluate       0.14    2019-05-28 [2] CRAN (R 4.1.2)
#>  fansi          1.0.2   2022-01-14 [2] CRAN (R 4.1.2)
#>  farver         2.1.0   2021-02-28 [2] CRAN (R 4.1.2)
#>  fastmap        1.1.0   2021-01-25 [2] CRAN (R 4.1.2)
#>  fs             1.5.2   2021-12-08 [2] CRAN (R 4.1.2)
#>  generics       0.1.1   2021-10-25 [2] CRAN (R 4.1.2)
#>  ggplot2        3.3.5   2021-06-25 [2] CRAN (R 4.1.2)
#>  ggrepel        0.9.1   2021-01-15 [2] CRAN (R 4.1.2)
#>  glue           1.6.0   2021-12-17 [2] CRAN (R 4.1.2)
#>  gtable         0.3.0   2019-03-25 [2] CRAN (R 4.1.2)
#>  gtools         3.9.2   2021-06-06 [2] CRAN (R 4.1.2)
#>  highr          0.9     2021-04-16 [2] CRAN (R 4.1.2)
#>  hms            1.1.1   2021-09-26 [2] CRAN (R 4.1.2)
#>  htmltools      0.5.2   2021-08-25 [2] CRAN (R 4.1.2)
#>  httr           1.4.2   2020-07-20 [2] CRAN (R 4.1.2)
#>  ISAnalytics  * 1.4.3   2022-01-16 [1] Bioconductor
#>  jquerylib      0.1.4   2021-04-26 [2] CRAN (R 4.1.2)
#>  jsonlite       1.7.2   2020-12-09 [2] CRAN (R 4.1.2)
#>  knitr          1.37    2021-12-16 [2] CRAN (R 4.1.2)
#>  lattice        0.20-45 2021-09-22 [2] CRAN (R 4.1.2)
#>  lifecycle      1.0.1   2021-09-24 [2] CRAN (R 4.1.2)
#>  lubridate      1.8.0   2021-10-07 [2] CRAN (R 4.1.2)
#>  magrittr     * 2.0.1   2020-11-17 [2] CRAN (R 4.1.2)
#>  mnormt         2.0.2   2020-09-01 [2] CRAN (R 4.1.2)
#>  munsell        0.5.0   2018-06-12 [2] CRAN (R 4.1.2)
#>  nlme           3.1-155 2022-01-13 [2] CRAN (R 4.1.2)
#>  pillar         1.6.4   2021-10-18 [2] CRAN (R 4.1.2)
#>  pkgconfig      2.0.3   2019-09-22 [2] CRAN (R 4.1.2)
#>  plyr           1.8.6   2020-03-03 [2] CRAN (R 4.1.2)
#>  polyclip       1.10-0  2019-03-14 [2] CRAN (R 4.1.2)
#>  polylabelr     0.2.0   2020-04-19 [2] CRAN (R 4.1.2)
#>  psych          2.1.9   2021-09-22 [2] CRAN (R 4.1.2)
#>  purrr          0.3.4   2020-04-17 [2] CRAN (R 4.1.2)
#>  R.methodsS3    1.8.1   2020-08-26 [2] CRAN (R 4.1.2)
#>  R.oo           1.24.0  2020-08-26 [2] CRAN (R 4.1.2)
#>  R.utils        2.11.0  2021-09-26 [2] CRAN (R 4.1.2)
#>  R6             2.5.1   2021-08-19 [2] CRAN (R 4.1.2)
#>  Rcapture       1.4-3   2019-12-16 [2] CRAN (R 4.1.2)
#>  Rcpp           1.0.8   2022-01-13 [2] CRAN (R 4.1.2)
#>  readr          2.1.1   2021-11-30 [2] CRAN (R 4.1.2)
#>  RefManageR   * 1.3.0   2020-11-13 [2] CRAN (R 4.1.2)
#>  rlang          0.4.12  2021-10-18 [2] CRAN (R 4.1.2)
#>  rmarkdown      2.11    2021-09-14 [2] CRAN (R 4.1.2)
#>  sass           0.4.0   2021-05-12 [2] CRAN (R 4.1.2)
#>  scales         1.1.1   2020-05-11 [2] CRAN (R 4.1.2)
#>  sessioninfo  * 1.2.2   2021-12-06 [2] CRAN (R 4.1.2)
#>  stringi        1.7.6   2021-11-29 [2] CRAN (R 4.1.2)
#>  stringr        1.4.0   2019-02-10 [2] CRAN (R 4.1.2)
#>  tibble         3.1.6   2021-11-07 [2] CRAN (R 4.1.2)
#>  tidyr          1.1.4   2021-09-27 [2] CRAN (R 4.1.2)
#>  tidyselect     1.1.1   2021-04-30 [2] CRAN (R 4.1.2)
#>  tmvnsim        1.0-2   2016-12-15 [2] CRAN (R 4.1.2)
#>  tzdb           0.2.0   2021-10-27 [2] CRAN (R 4.1.2)
#>  utf8           1.2.2   2021-07-24 [2] CRAN (R 4.1.2)
#>  vctrs          0.3.8   2021-04-29 [2] CRAN (R 4.1.2)
#>  vroom          1.5.7   2021-11-30 [2] CRAN (R 4.1.2)
#>  withr          2.4.3   2021-11-30 [2] CRAN (R 4.1.2)
#>  xfun           0.29    2021-12-14 [2] CRAN (R 4.1.2)
#>  xml2           1.3.3   2021-11-30 [2] CRAN (R 4.1.2)
#>  yaml           2.2.1   2020-02-01 [2] CRAN (R 4.1.2)
#>  zip            2.2.0   2021-05-31 [2] CRAN (R 4.1.2)
#> 
#>  [1] /tmp/RtmplowxUB/Rinst3081361f546c34
#>  [2] /home/biocbuild/bbs-3.14-bioc/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

7 Bibliography

This vignette was generated using BiocStyle (Oleś, 2022) with knitr (Xie, 2021) and rmarkdown (Allaire, Xie, McPherson, et al., 2021) running behind the scenes.

Citations made with RefManageR (McLean, 2017).

[1] J. Allaire, Y. Xie, J. McPherson, et al. rmarkdown: Dynamic Documents for R. R package version 2.11. 2021. URL: https://github.com/rstudio/rmarkdown.

[2] M. W. McLean. “RefManageR: Import and Manage BibTeX and BibLaTeX References in R”. In: The Journal of Open Source Software (2017). DOI: 10.21105/joss.00338.

[3] A. Oleś. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.22.0. 2022. URL: https://github.com/Bioconductor/BiocStyle.

[4] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.37. 2021. URL: https://yihui.org/knitr/.