Processing math: 100%

Contents

1 Introduction

The evaluomeR package permits to evaluate the reliability of bioinformatic metrics by analysing the stability and goodness of the classifications of such metrics. The method takes the measurements of the metrics for the dataset and evaluates the reliability of the metrics according to the following analyses: Correlations, Stability and Goodness of classifications.

2 Installation

The installation of evaluomeR package is performed via Bioconductor:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("evaluomeR")

2.1 Prerequisites

The package evaluomeR depends on the following CRAN packages for the calculus: cluster (Maechler et al. 2018), corrplot (Wei and Simko 2017). Moreover, this package also depends on grDevices, graphics, stats and utils from R Core (R Core Team 2018) for plotting and on the Bioconductor packages SummarizedExperiment (Morgan et al. 2018), MultiAssayExperiment (Ramos et al. 2017) for input/output data.

3 Using evaluomeR

3.1 Creating an input SummarizedExperiment

The input is a SummarizedExperiment object. The assay contained in SummarizedExperiment must follow a certain structure, see Table 1: A valid header must be specified. The first column of the header is the ID or name of the instance of the dataset (e.g., ontology, pathway, etc.) on which the metrics are measured. The other columns of the header contains the names of the metrics. The rows contains the measurements of the metrics for each instance in the dataset.

Table 1: Example of an input assay from a SummarizedExperiment for
the evaluomeR package.
ID MetricNameA MetricNameB MetricNameC
instance1 1.2 6.4 0.5
instance2 2.4 5.4 0.8
instance3 1.9 8.9 1.1

3.2 Using input sample data from evaluomeR

In our package we provide three different sample input data:

  • ontMetrics: Structural ontology metrics, 19 metrics measuring structural aspects of bio-ontologies have been analysed on two different corpora of ontologies: OBO Foundry and AgroPortal (Franco et al. 2019).

  • rnaMetrics: RNA quality metrics for the assessment of gene expression differences, 2 quality metrics from 16 aliquots of a unique batch of RNA Samples. The metrics are Degradation Factor (DegFact) and RNA Integrity Number (RIN) (Imbeaud et al. 2005).

  • bioMetrics: Metrics for biological pathways, 2 metrics that quantitative characterizations of the importance of regulation in biochemical pathway systems, including systems designed for applications in synthetic biology or metabolic engineering. The metrics are reachability and efficiency (Davis and Voit 2018).

The user shall run the data built-in method to load evaluomeR sample input data. This requires to provide the descriptor of the desired dataset. The datasets availables can take the following values: “ontMetrics”, “rnaMetrics” or “bioMetrics”.

library(evaluomeR)
data("ontMetrics")
data("rnaMetrics")
data("bioMetrics")

3.3 Correlations

We provide the metricsCorrelations function to evaluate the correlations among the metrics defined in the SummarizedExperiment:

library(evaluomeR)
data("rnaMetrics")
correlationSE <- metricsCorrelations(rnaMetrics, margins =  c(4,4,12,10))
## 
## Data loaded.
## Number of rows: 16
## Number of columns: 3

# Access the correlation matrix via its first assay:
# assay(correlationSE,1)

3.4 Stability analysis

The calculation of the stability indices is performed by stability and stabilityRange functions.

3.4.1 Stability

The stability index analysis is performed by the stability function. For instance, running a stability analysis for the metrics of rnaMetrics with a number of 100 bootstrap replicates with a k-means cluster whose k is 2 (note that k must be inside [2,15] range):

stabilityData <- stability(rnaMetrics, k=2, bs = 100)

The stability function returns the stabilityData object, a ExperimentList that contains the several assays such as the stability mean or the mean, betweenss, totss, tot.swithinss and anova values from the kmeans clustering:

stabilityData
## ExperimentList class object of length 9:
##  [1] stability_mean: SummarizedExperiment with 2 rows and 2 columns
##  [2] cluster_partition: SummarizedExperiment with 2 rows and 2 columns
##  [3] cluster_mean: SummarizedExperiment with 2 rows and 2 columns
##  [4] cluster_centers: SummarizedExperiment with 2 rows and 2 columns
##  [5] cluster_size: SummarizedExperiment with 2 rows and 2 columns
##  [6] cluster_betweenss: SummarizedExperiment with 2 rows and 2 columns
##  [7] cluster_totss: SummarizedExperiment with 2 rows and 2 columns
##  [8] cluster_tot.withinss: SummarizedExperiment with 2 rows and 2 columns
##  [9] cluster_anova: SummarizedExperiment with 2 rows and 2 columns

The stability indices plots shown when getImages = TRUE are generated with the values of the stability mean:

assay(stabilityData, "stability_mean")
Metric Mean_stability_k_2
RIN 0.82540873015873
DegFact 0.873916305916306

The plot represents the stability mean from each metric for a given k value. This mean is calculated by performing the average of every stability index from kranges [1,k] for each metric.

3.4.2 Stability range

The stabilityRange function is an iterative method of stability function. It performs a stability analysis for a range of k values (k.range).

For instance, analyzing the stability of rnaMetrics in range [2,4], with bs=100:

stabilityRangeData = stabilityRange(rnaMetrics, k.range=c(2,4), bs = 100)

Two kind of graphs are plotted in stabilityRange function. The first type (titled as “St. Indices for k=X across metrics”) shows, for every k value, the stability indices across the metrics. The second kind (titled as St. Indices for metric ‘X’ in range [x,y]), shows a plot of the behaviour of each metric across the k range.

3.5 Goodness of classifications

There are two methods to calculate the goodness of classifications: quality and qualityRange.

3.5.1 Quality

This method plots how the metrics behave for the current k value, according to the average silhouette width. Also, it will plot how the clusters are grouped for each metric (one plot per metric). For instance, running a quality analysis for the two metrics of rnaMetrics dataset, being k=4:

qualityData = quality(rnaMetrics, k = 4)

The data of the first plot titled as “Qual. Indices for k=4 across metrics” according to Silhouette avg. width, is stored in Avg_Silhouette_Width column from the first assay of the SummarizedExperiment, qualityData. The other three plots titled by their metric name display the input rows grouped by colours for each cluster, along with their Silhouette width scores.

The variable qualityData contains information about the clusters of each metric: The average silhouette width per cluster, the overall average sihouette width (taking into account all the clusters) and the number of individuals per cluster:

assay(qualityData,1)
Metric Cluster_1_SilScore Cluster_2_SilScore Cluster_3_SilScore Cluster_4_SilScore Avg_Silhouette_Width Cluster_1_Size Cluster_2_Size Cluster_3_Size Cluster_4_Size
RIN 0.420502645502646 0.696597735248404 0.473928571428572 0.324731573607137 0.488264943810529 4 4 5 3
DegFact 0.759196481622952 0.59496499852177 0.600198799385732 0.521618857725795 0.634170498361632 5 3 5 3

3.5.2 Quality range

The qualityRange function is an iterative method that uses the same functionality of quality for a range of values (k.range), instead for one unique k value. This methods allows to analyse the goodness of the classifications of the metric for different values of the range.

In the next example we will keep using the rnaMetrics dataset, and a k.range set to [4,6].

k.range = c(4,6)
qualityRangeData = qualityRange(rnaMetrics, k.range)

The qualityRange function also returns two kind of plots, as seen in Stability range section. One for each k in the k.range, showing the quality indices (goodness of the classification) across the metrics, and a second type of plot to show each metric with its respective quality index in each k value.

The qualityRangeData object returned by qualityRange is a ExperimentList from MultiAssayExperiment, which is a list of SummarizedExperiment objects whose size is diff(k.range)+1. In the example shown above, the size of qualityRangeData is 3, since the array length would contain the dataframes from k=4 to k=6.

diff(k.range)+1
## [1] 3
length(qualityRangeData)
## [1] 3

The user can access a specific dataframe for a given k value in three different ways: by dollar notation, brackets notation or using our wrapper method getDataQualityRange. For instance, if the user wishes to retrieve the dataframe which contains information of k=5, being the k.range [4,6]:

k5Data = qualityRangeData$k_5
k5Data = qualityRangeData[["k_5"]]
k5Data = getDataQualityRange(qualityRangeData, 5)
assay(k5Data, 1)
Metric Cluster_1_SilScore Cluster_2_SilScore Cluster_3_SilScore Cluster_4_SilScore Cluster_5_SilScore Avg_Silhouette_Width Cluster_1_Size Cluster_2_Size Cluster_3_Size Cluster_4_Size Cluster_5_Size
RIN 0.420502645502646 0.674226581940152 0.251461988304093 0.523616734143049 0 0.451735613203479 4 4 3 4 1
DegFact 0.718287037037037 0.0375000000000007 0.904545454545454 0.585791823535685 0.521618857725795 0.578190921755929 4 2 2 5 3

Once the user believes to have found a proper k value, then the user can run the quality function to see further silhouette information on the plots.

3.6 General functionality

In this section we describe a series of parameters that are shared among our analysis functions: metricsCorrelations, stability, stabilityRange, quality and qualityRange.

3.6.1 Disabling plotting

The generation of the images can be disabled by setting to FALSE the parameter getImages:

stabilityData <- stability(rnaMetrics, k=5, bs = 50, getImages = FALSE)

This prevents from generating any graph, performing only the calculus. By default getImages is set to TRUE.

4 Selecting the optimal value of k

evaluomeR analyzes the behavior of the metrics in terms of stability and goodness of the clusters for a range of values of k. In case of wishing to select the optimal value for k for a metric in a given dataset we have implemented the getOptimalKValue function, which returns a table stating which is the optimal value of k for each metric.

The algorithm works as follows: The highest stability and the highest goodness are obtained for the same value of k. In such case, that value would be the optimal one. On the other hand, the highest stability and the highest goodness are obtained for different values of k. In this case, additional criteria are needed. does not currently aim at providing those criteria, but to provide the data that could permit the user to make decisions. In the use cases described in this paper, we will apply the following criteria for the latter case:

stabilityData <- stabilityRange(data=ontMetrics, k.range=c(2,4), 
                                bs=20, getImages = FALSE, seed=100)
qualityData <- qualityRange(data=ontMetrics, k.range=c(2,4),
                            getImages = FALSE, seed=100)

kOptTable <- getOptimalKValue(stabilityData, qualityData)
Metric Stability_max_k Stability_max_k_stab Stability_max_k_qual Quality_max_k Quality_max_k_stab Quality_max_k_qual Global_optimal_k
ANOnto 2 1.0000000 0.7548590 2 1.0000000 0.7548590 2
AROnto 4 0.9276802 0.8138336 4 0.9276802 0.8138336 4
CBOOnto 2 0.8786781 0.7193928 3 0.7897443 0.7231989 3
CBOOnto2 2 0.8786781 0.7193928 3 0.7897443 0.7231989 3
CROnto 4 0.8450463 0.8450265 2 0.7831180 0.9643551 2
DITOnto 4 0.8228266 0.5933836 2 0.8129876 0.5971269 2
INROnto 2 0.9251030 0.7251026 2 0.9251030 0.7251026 2
LCOMOnto 3 1.0000000 0.6529131 3 1.0000000 0.6529131 3
NACOnto 2 0.8807614 0.7391430 2 0.8807614 0.7391430 2
NOCOnto 2 0.9478757 0.9431037 2 0.9478757 0.9431037 2
NOMOnto 2 0.8912204 0.6529583 3 0.8335419 0.6689736 3
POnto 2 1.0000000 0.7023466 2 1.0000000 0.7023466 2
PROnto 2 0.9941906 0.7065895 2 0.9941906 0.7065895 2
RFCOnto 2 0.8630298 0.6442806 2 0.8630298 0.6442806 2
RROnto 2 0.9941906 0.7065895 2 0.9941906 0.7065895 2
TMOnto 2 0.9664951 0.7331493 2 0.9664951 0.7331493 2
TMOnto2 2 1.0000000 0.9493957 2 1.0000000 0.9493957 2
WMCOnto 2 0.9169334 0.9464120 2 0.9169334 0.9464120 2
WMCOnto2 2 0.9208949 0.9467042 2 0.9208949 0.9467042 2

Additionally, you can select another subset of k.range to delimit the range of the optimal k.

kOptTable <- getOptimalKValue(stabilityData, qualityData, k.range=c(3,4))
## Processing metric: ANOnto
##  Stability k '4' is stable but quality k '3' is not
##  Using '4' since it provides higher stability
## Processing metric: AROnto
##  Maximum stability and quality values matches the same K value: '4'
## Processing metric: CBOOnto
##  Both Ks have a stable classification: '4', '3'
##  Using '3' since it provides higher silhouette width
## Processing metric: CBOOnto2
##  Both Ks have a stable classification: '4', '3'
##  Using '3' since it provides higher silhouette width
## Processing metric: CROnto
##  Both Ks have a stable classification: '4', '3'
##  Using '3' since it provides higher silhouette width
## Processing metric: DITOnto
##  Maximum stability and quality values matches the same K value: '4'
## Processing metric: INROnto
##  Stability k '4' is stable but quality k '3' is not
##  Using '4' since it provides higher stability
## Processing metric: LCOMOnto
##  Maximum stability and quality values matches the same K value: '3'
## Processing metric: NACOnto
##  Both Ks do not have a stable classification: '4', '3'
##  Using '3' since it provides higher silhouette width
## Processing metric: NOCOnto
##  Maximum stability and quality values matches the same K value: '3'
## Processing metric: NOMOnto
##  Maximum stability and quality values matches the same K value: '3'
## Processing metric: POnto
##  Stability k '4' is stable but quality k '3' is not
##  Using '4' since it provides higher stability
## Processing metric: PROnto
##  Maximum stability and quality values matches the same K value: '3'
## Processing metric: RFCOnto
##  Maximum stability and quality values matches the same K value: '3'
## Processing metric: RROnto
##  Maximum stability and quality values matches the same K value: '3'
## Processing metric: TMOnto
##  Both Ks have a stable classification: '4', '3'
##  Using '3' since it provides higher silhouette width
## Processing metric: TMOnto2
##  Both Ks have a stable classification: '3', '4'
##  Using '4' since it provides higher silhouette width
## Processing metric: WMCOnto
##  Both Ks have a stable classification: '4', '3'
##  Using '3' since it provides higher silhouette width
## Processing metric: WMCOnto2
##  Both Ks have a stable classification: '4', '3'
##  Using '3' since it provides higher silhouette width
Metric Stability_max_k Stability_max_k_stab Stability_max_k_qual Quality_max_k Quality_max_k_stab Quality_max_k_qual Global_optimal_k
ANOnto 4 0.7858551 0.7033107 3 0.6537009 0.7367429 4
AROnto 4 0.9276802 0.8138336 4 0.9276802 0.8138336 4
CBOOnto 4 0.8009833 0.5843870 3 0.7897443 0.7231989 3
CBOOnto2 4 0.8009833 0.5843870 3 0.7897443 0.7231989 3
CROnto 4 0.8450463 0.8450265 3 0.8202259 0.8553226 3
DITOnto 4 0.8228266 0.5933836 4 0.8228266 0.5933836 4
INROnto 4 0.7889145 0.6095614 3 0.7404618 0.6909412 4
LCOMOnto 3 1.0000000 0.6529131 3 1.0000000 0.6529131 3
NACOnto 4 0.7076790 0.6271890 3 0.6647891 0.6613224 3
NOCOnto 3 0.8972253 0.8791838 3 0.8972253 0.8791838 3
NOMOnto 3 0.8335419 0.6689736 3 0.8335419 0.6689736 3
POnto 4 0.8464754 0.6763749 3 0.6300979 0.6766154 4
PROnto 3 0.9704167 0.6686449 3 0.9704167 0.6686449 3
RFCOnto 3 0.8192584 0.6352988 3 0.8192584 0.6352988 3
RROnto 3 0.9704167 0.6686449 3 0.9704167 0.6686449 3
TMOnto 4 0.9002443 0.6944084 3 0.8665560 0.7100906 3
TMOnto2 3 0.9375096 0.7246579 4 0.8971283 0.7254086 4
WMCOnto 4 0.8539321 0.7370700 3 0.8151443 0.8285148 3
WMCOnto2 4 0.8699698 0.7294024 3 0.7801632 0.8702324 3

5 Metric analysis

We provide a series of methods for a further analysis on the metrics. These methods are: plotMetricsMinMax, plotMetricsBoxplot, plotMetricsCluster and plotMetricsViolin.

The plotMetricsMinMax function plots the minimum, maximum and standard deviation of min/max points of the values of each metric:

plotMetricsMinMax(ontMetrics)
## Warning: Use of `dataStats.df.t$Min` is discouraged. Use `Min` instead.
## Warning: Use of `dataStats.df.t$Max` is discouraged. Use `Max` instead.
## Warning: Use of `dataStats.df.t$Metric` is discouraged. Use `Metric` instead.
## Warning: Use of `dataStats.df.t$Min` is discouraged. Use `Min` instead.
## Warning: Use of `dataStats.df.t$Metric` is discouraged. Use `Metric` instead.
## Warning: Use of `dataStats.df.t$Max` is discouraged. Use `Max` instead.
## Warning: Use of `dataStats.df.t$Metric` is discouraged. Use `Metric` instead.
## Warning: Use of `dataStats.df.t$Max` is discouraged. Use `Max` instead.
## Warning: Use of `dataStats.df.t$Sd` is discouraged. Use `Sd` instead.
## Warning: Use of `dataStats.df.t$Max` is discouraged. Use `Max` instead.
## Warning: Use of `dataStats.df.t$Sd` is discouraged. Use `Sd` instead.
## Warning: Use of `dataStats.df.t$Metric` is discouraged. Use `Metric` instead.
## Warning: Use of `dataStats.df.t$Min` is discouraged. Use `Min` instead.
## Warning: Use of `dataStats.df.t$Sd` is discouraged. Use `Sd` instead.
## Warning: Use of `dataStats.df.t$Min` is discouraged. Use `Min` instead.
## Warning: Use of `dataStats.df.t$Sd` is discouraged. Use `Sd` instead.
## Warning: Use of `dataStats.df.t$Metric` is discouraged. Use `Metric` instead.

The plotMetricsBoxplot method boxplots the value of each metric:

plotMetricsBoxplot(rnaMetrics)
## Warning: Use of `data.melt$variable` is discouraged. Use `variable` instead.
## Warning: Use of `data.melt$value` is discouraged. Use `value` instead.

Next, the plotMetricsCluster function clusters the values of the metrics by using the euclidean distance and the method ward.D2 from hclust:

plotMetricsCluster(ontMetrics)

And finally the plotMetricsViolin function:

plotMetricsViolin(rnaMetrics)
## Warning: Use of `data.melt$variable` is discouraged. Use `variable` instead.
## Warning: Use of `data.melt$value` is discouraged. Use `value` instead.
## Warning: Use of `data.melt$variable` is discouraged. Use `variable` instead.
## Warning: Use of `data.melt$value` is discouraged. Use `value` instead.

6 Information

6.1 Contact

The source code is available at github. For bug/error reports please refer to evaluomeR github issues https://github.com/neobernad/evaluomeR/issues.

6.2 License

The package ‘evaluomeR’ is licensed under GPL-3.

6.3 How to cite

Currently there is no literature for evaluomeR. Please cite the R package, the github or the website. This package will be updated as soon as a citation is available.

6.4 Additional information

The evaluomeR functionality can also be access through a web interface1 Evaluome web an API REST2 API documentation.

6.5 Session information

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] evaluomeR_1.14.0            flexmix_2.3-18             
##  [3] lattice_0.20-45             randomForest_4.7-1.1       
##  [5] fpc_2.2-9                   cluster_2.1.4              
##  [7] MultiAssayExperiment_1.24.0 SummarizedExperiment_1.28.0
##  [9] Biobase_2.58.0              GenomicRanges_1.50.0       
## [11] GenomeInfoDb_1.34.0         IRanges_2.32.0             
## [13] S4Vectors_0.36.0            BiocGenerics_0.44.0        
## [15] MatrixGenerics_1.10.0       matrixStats_0.62.0         
## [17] magrittr_2.0.3              kableExtra_1.3.4           
## [19] BiocStyle_2.26.0           
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-7           webshot_0.5.4          httr_1.4.4            
##  [4] prabclus_2.3-2         tools_4.2.1            bslib_0.4.0           
##  [7] utf8_1.2.2             R6_2.5.1               DBI_1.1.3             
## [10] colorspace_2.0-3       nnet_7.3-18            tidyselect_1.2.0      
## [13] compiler_4.2.1         cli_3.4.1              rvest_1.0.3           
## [16] xml2_1.3.3             DelayedArray_0.24.0    ggdendro_0.1.23       
## [19] labeling_0.4.2         bookdown_0.29          sass_0.4.2            
## [22] diptest_0.76-0         scales_1.2.1           DEoptimR_1.0-11       
## [25] robustbase_0.95-0      systemfonts_1.0.4      stringr_1.4.1         
## [28] digest_0.6.30          rmarkdown_2.17         svglite_2.1.0         
## [31] XVector_0.38.0         pkgconfig_2.0.3        htmltools_0.5.3       
## [34] plotrix_3.8-2          fastmap_1.1.0          highr_0.9             
## [37] rlang_1.0.6            rstudioapi_0.14        farver_2.1.1          
## [40] jquerylib_0.1.4        generics_0.1.3         jsonlite_1.8.3        
## [43] mclust_6.0.0           dplyr_1.0.10           RCurl_1.98-1.9        
## [46] modeltools_0.2-23      GenomeInfoDbData_1.2.9 Matrix_1.5-1          
## [49] Rcpp_1.0.9             munsell_0.5.0          fansi_1.0.3           
## [52] lifecycle_1.0.3        stringi_1.7.8          yaml_2.3.6            
## [55] MASS_7.3-58.1          zlibbioc_1.44.0        plyr_1.8.7            
## [58] grid_4.2.1             parallel_4.2.1         magick_2.7.3          
## [61] knitr_1.40             pillar_1.8.1           reshape2_1.4.4        
## [64] glue_1.6.2             evaluate_0.17          BiocManager_1.30.19   
## [67] vctrs_0.5.0            Rdpack_2.4             gtable_0.3.1          
## [70] kernlab_0.9-31         assertthat_0.2.1       cachem_1.0.6          
## [73] ggplot2_3.3.6          xfun_0.34              rbibutils_2.2.9       
## [76] class_7.3-20           viridisLite_0.4.1      tibble_3.1.8          
## [79] corrplot_0.92

Bibliography

Davis, Jacob D, and Eberhard O Voit. 2018. “Metrics for Regulated Biochemical Pathway Systems.” Bioinformatics. https://doi.org/10.1093/bioinformatics/bty942.

Franco, Manuel, Juana María Vivo, Manuel Quesada-Martínez, Astrid Duque-Ramos, and Jesualdo Tomás Fernández-Breis. 2019. “Evaluation of ontology structural metrics based on public repository data.” Bioinformatics, February. https://doi.org/10.1093/bib/bbz009.

Imbeaud, Sandrine, Esther Graudens, Virginie Boulanger, Xavier Barlet, Patrick Zaborski, Eric Eveno, Odilo Mueller, Andreas Schroeder, and Charles Auffray. 2005. “Towards Standardization of Rna Quality Assessment Using User-Independent Classifiers of Microcapillary Electrophoresis Traces.” Nucleic Acids Research 33 (6): e56–e56.

Jaccard, Paul. 1901. “Distribution de La Flore Alpine Dans Le Bassin Des Dranses et Dans Quelques Regions Voisines.” Bull Soc Vaudoise Sci Nat 37: 241–72.

Kaufman, Leonard, and Peter J Rousseeuw. 2009. Finding Groups in Data: An Introduction to Cluster Analysis. Vol. 344. John Wiley & Sons.

Maechler, Martin, Peter Rousseeuw, Anja Struyf, Mia Hubert, Kurt Hornik, Matthias Studer, Pierre Roudier, Juan Gonzalez, and Kamil Kozlowski. 2018. R Package "Cluster": "Finding Groups in Data": Cluster Analysis Extended Rousseeuw et Al". https://cran.r-project.org/package=cluster.

Milligan, Glenn W, and Richard Cheng. 1996. “Measuring the Influence of Individual Data Points in a Cluster Analysis.” Journal of Classification 13 (2): 315–35.

Morgan, Martin, Valerie Obenchain, Jim Hester, and Hervé Pagès. 2018. SummarizedExperiment: SummarizedExperiment Container.

Ramos, Marcel, Lucas Schiffer, Angela Re, Rimsha Azhar, Azfar Basunia, Carmen Rodriguez Cabrera, Tiffany Chan, et al. 2017. “Software for the Integration of Multi-Omics Experiments in Bioconductor.” Cancer Research 77(21); e39-42.

R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Wei, Taiyun, and Viliam Simko. 2017. R Package "Corrplot": Visualization of a Correlation Matrix. https://github.com/taiyun/corrplot.