1 Introduction

The sights package provides numerous normalization methods that correct the three types of bias that affect High-Throughput Screening (HTS) measurements: overall plate bias, within-plate spatial bias, and across-plate bias. Commonly-used normalization methods such as Z-scores (or methods such as percent inhibition/activation which use within-plate controls to normalize) correct only overall plate bias. Methods included in this package attempt to correct all three sources of bias and typically give better results.

Two statistical tests are also provided: the standard one-sample t-test and the recommended one-sample Random Variance Model (RVM) t-test, which has greater statistical power for the typically small number of replicates in HTS. Correction for the multiple statistical testing of the large number of constructs in HTS data is provided by False Discovery Rate (FDR) correction. The FDR can be described as the proportion of false positives among the statistical tests called significant.

Included graphical and statistical methods provide the means for evaluating data analysis choices for HTS assays on a screen-by-screen basis. These graphs can be used to check fundamental assumptions of both raw and normalized data at every step of the analysis process.

Citing Methods

Please cite the sights package and specific methods as appropriate.

References for the methods can be found in this vignette, on their specific help pages, and in the manual. They can also be accessed by help(sights_method_name) in R. For example:

The package citation can be accessed in R by:

2 Getting Started

2.1 Installation and loading

  1. Please install the package directly from Bioconductor and load it. Note that SIGHTS requires a minimum R version of 3.3.
  1. This should also install and load the packages that SIGHTS imports: ggplot2 (Wickham, 2009), reshape2 (Wickham, 2007), qvalue (Storey, 2015), MASS (Venables and Ripley, 2002), and lattice (Sarkar, 2008).
    Otherwise, you can install/update these packages manually.

2.2 Importing and exporting data

All SIGHTS normalization functions require that the data be arranged such that each plate is a column and each row is a well. The arrangement within each plate should be by-row first, then by-column. For more details and example, see help("ex_dataMatrix"). This required arrangement can be done in Microsoft Excel before importing the data into R, although advanced users may prefer to do so in R as needed.

  1. The datasets within SIGHTS can be loaded by:
  1. Your own data can be imported by giving the path of your file:
  • If it is a .csv or .txt file, run
  • If it is a Microsoft Excel file, you can import it directly by installing another package:
  1. Similarly any object saved in R (e.g. normalized results) can be exported as .csv or .xlsx files:

2.3 Information about data

  1. There are two datasets provided within SIGHTS:
  • CMBA data (Murie et al., 2015), see help("ex_dataMatrix")
  • Inglese et. al. data (Inglese et al., 2006), see help("inglese")
  1. Some basic information about data (including your own data after importing) can be accessed by various functions. For example, information about the Inglese et al. data set can be obtained as follows:

2.4 Information about methods

  1. There are several methods provided within SIGHTS:
  • Normalization:
    • Z, Robust Z (see Malo et al. (2006)),
    • Loess (Baryshnikova et al., 2010),
    • Median Filter (Bushway et al., 2011),
    • R (Wu et al., 2008), and
    • SPAWN (Murie et al., 2015).
  • Statistical testing:
    • one-sample t-test,
    • one-sample RVM t-test (Malo et al., 2006; Wright and Simon, 2003), and
    • FDR correction (Storey, 2002).
  • Plotting:
    • 3d plot,
    • heatmap,
    • auto-correlation plot,
    • scatter plot,
    • boxplot,
    • inverse-gamma fit plot, and
    • histograms.

See help("normSights"), help("statSights"), help("plotSights"), and the help pages of individual methods for more information.

  1. Information about the package functions can be accessed by:

2.5 Quick reference

  1. Normalization - All normalization functions are accessible either via normSights() or their individual function names (e.g. normSPAWN()).

  2. Statistical tests - All statistical testing functions are accessible either via statSights() or their individual function names (e.g. statRVM()).

  3. Plots - All plotting functions are accessible either via plotSights() or their individual function names (e.g. plotAutoco()).

The results of these functions can be saved as objects and called by their assigned names. For example:

4 Advanced Plotting

4.1 Basic modifications

All SIGHTS plotting functions, which use the ggplot2 package (Wickham, 2009) (i.e., all except plot3d that uses lattice graphics), have an ellipsis argument (“…”) which passes on additional parameters to the specific ggplot geom being used in that function. For example, the default plot title and the bar colors of the histogram can be modified as follows:

Ellipsis: Add parameters to ggplot geom [@wickham2009ggplot2].

Ellipsis: Add parameters to ggplot geom (Wickham, 2009).


4.2 Extended modifications

All SIGHTS plotting functions, which use ggplot, produce ggplot objects that can be modified.

Other packages which provide more plotting options can be installed as well: ggthemes (Arnold and Arnold, 2015), gridExtra (Auguie et al., 2015).

Below are some examples of the plotting modifications that can be achieved using ggplot2/ggthemes/gridExtra (Wickham, 2009, @arnold2015package, @auguie2015package) functions:

  1. Layers can be added that override defaults.
    However, faceting is not possible owing to dataset formatting within SIGHTS functions.
Layers: Add layers like ggplot2 [@wickham2009ggplot2].

Layers: Add layers like ggplot2 (Wickham, 2009).

Note: When plotSep = TRUE, a list of plot objects is produced, which can be called individually and modified, as in the example below.


  1. Outliers can be removed from plots by limiting the axes in one of 2 ways.
Original plot: All points in the original data are plotted, without setting any data limits.

Original plot: All points in the original data are plotted, without setting any data limits.

Constrained limits: Data are constrained before plotting, so that points outside of the limits are not considered when drawing aspects of the plot that are estimated from the data such as the loess regression line. Note that the line differs from the original plot above.

Constrained limits: Data are constrained before plotting, so that points outside of the limits are not considered when drawing aspects of the plot that are estimated from the data such as the loess regression line. Note that the line differs from the original plot above.

Zoomed-in limits: Original data are used but plot only shows the data within the specified limits. Note, however, that the line is the same within the restricted range as in the original plot above.

Zoomed-in limits: Original data are used but plot only shows the data within the specified limits. Note, however, that the line is the same within the restricted range as in the original plot above.


  1. Different plots can be arranged in the same window.
Arrangement: Multiple plots can be custom-arranged in one window by using gridExtra package [@auguie2015package].

Arrangement: Multiple plots can be custom-arranged in one window by using gridExtra package (Auguie et al., 2015).


References

Arnold,J.B. and Arnold,M.J.B. (2015) Package ‘ggthemes’.

Auguie,B. et al. (2015) Package ‘gridExtra’.

Baryshnikova,A. et al. (2010) Quantitative analysis of fitness and genetic interactions in yeast on a genome scale. Nature methods, 7, 1017–1024.

Bushway,P.J. et al. (2011) Optimization and application of median filter corrections to relieve diverse spatial patterns in microtiter plate data. Journal of biomolecular screening, 16, 1068–1080.

Inglese,J. et al. (2006) Quantitative high-throughput screening: A titration-based approach that efficiently identifies biological activities in large chemical libraries. Proceedings of the National Academy of Sciences, 103, 11473–11478.

Malo,N. et al. (2006) Statistical practice in high-throughput screening data analysis. Nature biotechnology, 24, 167–175.

Murie,C. et al. (2015) Improving detection of rare biological events in high-throughput screens. Journal of biomolecular screening, 20, 230–241.

Murie,C. et al. (2013) Control-plate regression (cpr) normalization for high-throughput screens with many active features. Journal of biomolecular screening, 1087057113516003.

Murie,C. et al. (2009) Comparison of small n statistical tests of differential expression applied to microarrays. BMC bioinformatics, 10, 1.

Sarkar,D. (2008) Lattice: Multivariate data visualization with r Springer, New York.

Storey,J. (2015) Qvalue: Q-value estimation for false discovery rate control.

Storey,J.D. (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64, 479–498.

Venables,W.N. and Ripley,B.D. (2002) Modern applied statistics with s Fourth. Springer, New York.

Wickham,H. (2009) Ggplot2: Elegant graphics for data analysis Springer New York.

Wickham,H. (2007) Reshaping data with the reshape package. Journal of Statistical Software, 21, 1–20.

Wright,G.W. and Simon,R.M. (2003) A random variance model for detection of differential gene expression in small microarray experiments. Bioinformatics, 19, 2448–2455.

Wu,Z. et al. (2008) Quantitative assessment of hit detection and confirmation in single and duplicate high-throughput screenings. Journal of biomolecular screening, 13, 159–167.