1 Getting Started

1.1 Accessing the Graphical User Interface

The public server for DiscoRhythm is available here, however, local usage is advised for improved performance (see 1.2).

See the guide in section 4 for details on usage of the application.

1.2 Installation

To run the application locally or use DiscoRhythm with R, the DiscoRhythm R package must be installed.

DiscoRhythm and its dependencies can be installed by executing the following in R:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("DiscoRhythm")

Note: Manual installation of pandoc is required in order to use the report generation features of DiscoRhythm.

Running the command library(DiscoRhythm); discoApp() will then launch the application. The application can be parallelized on multiple cores for improved performance using discoApp(ncores=number_of_available_cores).

1.2.1 Run the Web Application with Docker (Optional)

Alternatively, if docker is installed, the DiscoRhythm container on Docker Hub can be pulled and used to run the web application locally avoiding the need to install DiscoRhythm and its dependencies (See instructions on Docker Hub).

1.3 Using DiscoRhythm in R

The same computations performed in the web application can be executed directly in R. This may be necessary for:

  • Executing the workflow conveniently on multiple datasets
  • Analyzing large datasets
  • Adding custom R code to the workflow

Section 5 describes this usage in more detail.

2 Introduction

DiscoRhythm is a set of statistical tools for analyzing large scale temporal biological experiments with a hypothesized periodicity (e.g. circadian transcriptomic experiments). The main goal of this package is to take a normalized data matrix and characterize the rhythmicity of the features. The entire workflow can be run interactively in the web application or run directly in R to perform:

  1. Import and cleaning of a normalized data matrix
  2. Inter-sample correlations and outlier detection
  3. Principal component analysis and outlier detection
  4. Analysis of experimental replicates (if applicable)
  5. Detection of dominant rhythmicities in the dataset
  6. Detection of feature-wise oscillation characteristics
    • Estimating cyclical characteristics such as: period, phase, amplitude, and statistical significance using four methods (Cosinor, JTK Cycle, ARSER, and Lomb-Scargle).

Section 3 describes the expected input to the DiscoRhythm workflow.

Next, section 4 will describe these steps and their use in the web application. Section 5 will then describe how to generate the same results using the DiscoRhythm R package directly.

3 Input Datasets

3.1 Example Dataset

Below is a small simulated circadian transcriptomic dataset generated using the simphony which follows the expected input format for DiscoRhythm. The dataset was generated to contain ~50% rhythmic transcripts with a diversity of phases of oscillation.

## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang
## Registered S3 method overwritten by 'seriation':
##   method         from 
##   reorder.hclust gclus
## Registered S3 method overwritten by 'rvest':
##   method            from
##   read_xml.response xml2
IDs CT0_1_A CT0_2_B CT0_3_C CT4_4_D CT4_5_E
nonRhythmGene_Id1 6.3717658 4.0909466 8.3010577 7.0240708 -4.6510406
nonRhythmGene_Id2 -2.4340798 -5.6695362 -3.0800470 -0.1800393 3.1282445
nonRhythmGene_Id3 -9.8373929 -5.8738313 -0.7793628 -2.7236067 2.6574806
nonRhythmGene_Id4 4.5951368 0.2952068 3.5660223 -1.2458999 6.1385621
nonRhythmGene_Id5 0.3980836 3.9928686 1.5626314 0.0613089 0.7013376
nonRhythmGene_Id6 -0.2139567 -2.4819291 -6.5602675 -1.9901769 -7.5526251

3.2 Row Naming

The first column should contain unique feature IDs (e.g. gene names in this case).

IDs CT0_1_A CT0_2_B CT0_3_C CT4_4_D CT4_5_E
nonRhythmGene_Id1 6.3717658 4.0909466 8.3010577 7.0240708 -4.6510406
nonRhythmGene_Id2 -2.4340798 -5.6695362 -3.0800470 -0.1800393 3.1282445
nonRhythmGene_Id3 -9.8373929 -5.8738313 -0.7793628 -2.7236067 2.6574806
nonRhythmGene_Id4 4.5951368 0.2952068 3.5660223 -1.2458999 6.1385621
nonRhythmGene_Id5 0.3980836 3.9928686 1.5626314 0.0613089 0.7013376
nonRhythmGene_Id6 -0.2139567 -2.4819291 -6.5602675 -1.9901769 -7.5526251

All subsequent columns contain experimental sample data.

IDs CT0_1_A CT0_2_B CT0_3_C CT4_4_D CT4_5_E
nonRhythmGene_Id1 6.3717658 4.0909466 8.3010577 7.0240708 -4.6510406
nonRhythmGene_Id2 -2.4340798 -5.6695362 -3.0800470 -0.1800393 3.1282445
nonRhythmGene_Id3 -9.8373929 -5.8738313 -0.7793628 -2.7236067 2.6574806
nonRhythmGene_Id4 4.5951368 0.2952068 3.5660223 -1.2458999 6.1385621
nonRhythmGene_Id5 0.3980836 3.9928686 1.5626314 0.0613089 0.7013376
nonRhythmGene_Id6 -0.2139567 -2.4819291 -6.5602675 -1.9901769 -7.5526251

3.3 Column Naming

Sample metadata is extracted from the column names of the matrix.

IDs CT0_1_A CT0_2_B CT0_3_C CT4_4_D CT4_5_E
nonRhythmGene_Id1 6.3717658 4.0909466 8.3010577 7.0240708 -4.6510406
nonRhythmGene_Id2 -2.4340798 -5.6695362 -3.0800470 -0.1800393 3.1282445
nonRhythmGene_Id3 -9.8373929 -5.8738313 -0.7793628 -2.7236067 2.6574806
nonRhythmGene_Id4 4.5951368 0.2952068 3.5660223 -1.2458999 6.1385621
nonRhythmGene_Id5 0.3980836 3.9928686 1.5626314 0.0613089 0.7013376
nonRhythmGene_Id6 -0.2139567 -2.4819291 -6.5602675 -1.9901769 -7.5526251

Names are expected to follow the pattern:

Prefix Time*_Unique Id_Replicate Id

Descriptions of the naming convention used by DiscoRhythm

* Mandatory field
Field Description Examples
Prefix A unit of time. The web application will use this as the unit of time by default. hr, ZT, CT
Time* Indicates the time of collection for the respective sample. Can only be positive values. 20, 2.1, 0.3
Unique Id A free field used to uniquely identify samples in visualizations and summaries. GSM3186429, sample1, subjectA, AX
Replicate Id Used to identify each biological sample uniquely when combined with Time. 1, A, rep1

Time is the only required field (i.e. 32, CT32, CT32_AS_1, 32_AS_1 are all valid naming styles) and the other fields are used to obtain a few pieces of sample metadata if necessary:

Biological vs Technical Replicates - Time + Replicate Id1 If no Replicate Id is provided, all samples are assumed to be independent biological replicates. are used to identify independent samples collected at the same timepoint (biological replicates). Samples with the same Time and Replicate Id are assumed to be technical replicates from a single biological sample.

Unique Sample Identity - Each column name should be unique such that all samples can be uniquely identified to the user. The Unique Id field is intended for this purpose. If column names are not unique, the Unique Id field will be generated to provide unique sample names during usage of DiscoRhythm.

Note: all fields should contain only alphanumeric values (with the exception of ‘.’ in the Time field which is allowed for decimal values.

3.3.1 Processed Metadata Table

The data extracted above is stored in the DiscoRhythm application as:

ID Time ReplicateID
CT0_1_A CT0_1_A 0 A
CT0_2_B CT0_2_B 0 B
CT0_3_C CT0_3_C 0 C
CT4_4_D CT4_4_D 4 D
CT4_5_E CT4_5_E 4 E
CT4_6_F CT4_6_F 4 F

Note: If there are more groupings in the dataset or if the dataset does not fit into this design, it may be appropriate to split the dataset into subgroups that fit this design (each subgroup as one input dataset to DiscoRhythm).

3.4 Time Type

DiscoRhythm defines time in an input dataset in one of two ways:

  1. Linear time
  2. Circular time

Linear time exists in systems where an experiment start time is meaningful (often setting t=0 to some specific event). Circular time exists in experiments where the start of experiment is not meaningful or left unobserved (e.g. time-of-day in a cross-sectional study). One of these two types must be specified for the input dataset and will influence how DiscoRhythm analysis is performed.

Example of linear time in hours: 1,2,3 ... 24,25,26
Example of circular time, time-of-day in hours: 1,2,3 ... 24,1,2

For example, a dataset obtained from mice entrained on a 12-h light, 12-h dark schedule before being released into total darkness prior to collection. The presence of a specific event (release into total darkness) would make the dataset suitable for the “Linear time” setting of DiscoRhythm.

Note: If samples were collected during the entrainment to the light/dark base-cycle, “Circular time” would be appropriate as mice sampled at the same point in the cycle on different days may be treated as biological replicates.

4 DiscoRhythm Interface Walkthrough

This section will walk the user through each section of the DiscoRhythm web application. It is recommended to keep this documentation open during first usage of the application to read details for each section.

4.1 The Interface

The DiscoRhythm web interface is a dashboard where the sections of the analysis can be accessed in the sidebar and progress in a sequential fashion with data from the previous step being fed into the next.

There are interactive controls to set parameters for each section’s analysis. When parameters relevant to a figure change, the corresponding figure will dynamically update to reflect the newly calculated results. Since DiscoRhythm is intended for use sequentially through the sections, returning to earlier sections of the analysis and modifying parameters may result in unexpected behaviour.

Various download buttons are available throughout the application for archival of both plot outputs and numerical results.

Screenshot of the initial DiscoRhythm landing page.

Figure 1: Screenshot of the initial DiscoRhythm landing page

4.2 Select Data

Purpose: To upload, clean, and summarize the experimental design for the input dataset.

An input dataset is expected to be in comma separated value (CSV) format as specified in section 3. Upload the dataset using the “upload CSV” input method.

The simulated dataset (from section 3.1) is available to test the features of DiscoRhythm.

Messages or warnings may be seen at this point as DiscoRhythm imports the dataset and performs a few cleaning tasks:

  • Rows where max=min (constant values) will be removed
  • Rows with missing values will be removed
  • Column names will be checked for valid formatting
    • Duplicate column names will be deduplicated by replacing all Unique Id2 See 3.3

Specify the other analysis options for the dataset:

Time Type - See 3.4.

Period of Interest - The main hypothesized period should be specified in order to set appropriate defaults throughout the application. If unknown, set to the range of the sample collection times.

Time Unit/Observation Unit - Units to display in the axis labels throughout the application.

Screenshot of the 'Select Data' section of the DiscoRhythm interface.

Figure 2: Screenshot of the ‘Select Data’ section of the DiscoRhythm interface

If the “Sampling Summary” table does not seem to accurately reflect the data, please refer back to section 3.3. It is also a good idea to expand the “Inspect Input Data Matrix” and “Inspect Parsed Metadata” boxes to ensure the data has been read correctly.

4.3 Outlier Removal

Experimental artifacts or errors commonly result in data from collected samples to be not accurately reflective of the true biological phenomenon. This can often be observed through systematic signals from a single sample which do not have biological plausibility. DiscoRhythm attempts to detect such systematic outliers by two methods:

  • Intersample-correlations
  • Principal component analysis

Each method is applied independently to the dataset to detect outliers and then the filtering summary section is used to decide which detected outliers to remove. A reasonable standard deviation threshold for both methods would be around 2 to 3.

By default, no outliers will be flagged for removal. The DiscoRhythm web application will set the default threshold such that no outliers are flagged.

4.3.1 Inter-sample Correlation

Purpose: Samples are pairwise correlated using either the Pearson or Spearman method of correlation to detect outliers.

Heatmap: The values of these pairwise correlations can be visualized in this tab, where samples with similar correlation values are grouped together using clustering.

Outlier Detection: The average correlation value for each sample is used as a metric of its overall similarity to all other samples and is summarized in this figure.

Samples with a high deviation below the mean will be flagged as outliers where the user may specify a number of standard deviations below the mean to use as a threshold.

Screenshot of the 'Inter-sample Correlation' section of the DiscoRhythm interface.

Figure 3: Screenshot of the ‘Inter-sample Correlation’ section of the DiscoRhythm interface

4.3.2 Principal Component Analysis

Purpose: Utilize principal component analysis (PCA) to detect outliers.

PCA is used to extract the strongest recurring patterns in the dataset. Outliers detected in these patterns (PC scores) are flagged by their deviation from the mean where again the user may specify a threshold in units of standard deviations to use.

Scale Before PCA: Whether to scale rows to a standard deviation of 1 prior to PCA such that all rows are on an equal scale. Scaling is usually advisable.

PCs to Use For Outlier Detection: Click to change the list of PCs to use for outlier removal in the case a PC is determined inappropriate for use in outlier detection. You can remove unwanted PCs by pressing “delete” and add extra ones by typing their number.

Before CSV/After CSV: Downloadable summaries of the PCA before and after the detected outliers are removed.

Figures:

Distributions: The distributions of PC scores used to detect outliers. Only the PCs colored darkly are used for the final outlier flagging. Outliers will be shown with an ‘x’.

Scree: PCs are numbered where the amount of variance explained by each PC (therefore their ‘importance’) decreases with increasing PC number. This can be seen in the “Scree” figure. Users should choose an appropriate number of PCs to use for outlier detection by the shape of this scree plot.

One Pair and All Pairs: Plotting the PC scores of the components versus one another may reveal grouping that cannot be determined from simple analysis of individual PCs.

Screenshot of the 'PCA' section of the DiscoRhythm interface.

Figure 4: Screenshot of the ‘PCA’ section of the DiscoRhythm interface

4.3.3 Filtering Summary

Purpose: Determine how to proceed with outlier removal.

The user may at this point choose to remove the flagged outliers or may disregard these flags if it is suspected the dissimilarity of these samples may be biologically relevant. The user may also remove samples which they deem to be unreliable for further analysis by other metrics (e.g. experimental quality metrics).

Raw Distributions: A boxplot for each individual sample which can be used to further evaluate sample selection.

Input and Final: Shows summary tables for data before and after outlier removal.

Screenshot of the 'Filtering Summary' section of the DiscoRhythm interface.

Figure 5: Screenshot of the ‘Filtering Summary’ section of the DiscoRhythm interface

4.4 Row Selection

Purpose: Utilize any technical replicates present in the dataset to quantify the signal-to-noise for each row. Combine technical replicates for downstream analysis.

Technical replicates are not useful for the statistical tests used by DiscoRhythm for oscillation detection as they are not representative of the populational variance of the data (i.e. do not satisfy the independence assumptions). They will instead be used to identify rows of the dataset where the biological variation is greater than the technical variation (i.e. high signal-to-noise). A hypothesis test is performed using ANOVA procedures to determine whether there may be a real biological signal in the row.

ANOVA Method: 3 options are available for ANOVA:

  1. Equal Variance - all sets of technical replicates are assumed to have the same variance. Recommended in most cases.
  2. Welch - sets of technical replicates may have different variance.
  3. None - do not test rows using ANOVA.

F-statistic Cutoff: The user may choose to filter rows by the magnitude of the signal-to-noise rather than by statistical significance.

Replicates should be combined for downstream rhythmicity analysis. DiscoRhythm provides three methods for combining technical replicates:

  • Mean - Take the mean of each set of technical replicates
  • Median - Take the median of each set of technical replicates
  • Random Selection - Take one of the technical replicates for each sample at random

Note: Users may also choose to not combine technical replicates. This is only advisable if the technical replicates do in fact represent independent samples of the population/dataset (i.e. if they were erroneously labelled in section 3).

Screenshot of the 'Row Selection' section of the DiscoRhythm interface.

Figure 6: Screenshot of the ‘Row Selection’ section of the DiscoRhythm interface

4.5 Period Detection

Purpose: Summarize the strength of multiple periodicities across the entire dataset.

4.5.1 Period Detection

Purpose: Detecting the strength of a range of periods of oscillation across the entire dataset using a Cosinor based approach.

Spectral analysis will be limited3 For circular time, only harmonics of the base-cycle will be available for testing. from a smallest period of 3 times the sampling-interval up to the sampling duration and 12 periods will be tested evenly spaced across this range.

Note: Experiments typically have a hypothesized period length (such as the organisms period of activity) and that period should be used for analysis in section 4.6 when possible. When using this tool for determining which periodicity to use, follow up experiments may be appropriate for validation.

Screenshot of the 'Period Detection' section of the DiscoRhythm interface.

Figure 7: Screenshot of the ‘Period Detection’ section of the DiscoRhythm interface

4.5.2 PC fits

To determine the presence of a single rhythmic pattern in the data, PCA is performed to visualize and test the summarized temporal signal for rhythmicity (using the Cosinor method).

Screenshot of the 'PC Cosinor Fits' section of the DiscoRhythm interface.

Figure 8: Screenshot of the ‘PC Cosinor Fits’ section of the DiscoRhythm interface

4.6 Oscillation Detection

Purpose: Individually quantify rhythmicity of remaining rows of the dataset where each row will be tested for rhythmicity using methods suitable for the sample collections present.

4.6.1 Rhythmicity Calculation Configuration

The user must choose a single period4 If it is unknown which periodicity to test start with the dominant period seen in section 4.5. of oscillation to test across all rows of the dataset. The application may show warnings/messages regarding the choice of period. By default, the period input in the ‘Select Data’ section will be chosen.

JTK Cycle, Lomb-Scargle, and ARSER results are all obtained through the MetaCycle R package (meta2d function using minper=maxper). Cosinor is a built-in function of DiscoRhythm. A brief summary of each method:

Cosinor - a.k.a “Harmonic Regression” Fits a sinusoid with a free phase parameter.
JTK Cycle - non-parametric test of rhythmicity robust to outliers.
Lomb-Scargle - an approach using spectral power density. ARSER - removes linear trends and performs the Cosinor test.

Exclusion Criteria Matrix: A table is presented which describes the criteria which exclude a method from use and shown are criteria which are true for the loaded dataset5 If no criteria are present, the table will be absent.. The reasons may be due to either computational (causes errors under given conditions) or statistical restrictions (requirements of study design) of the method.

Criteria Description
missing_value Rows contain missing values.
with_bio_replicate Biological replicates are present.
non_integer_interval The spacing between samples is not an integer value.
uneven_interval Time between collections is not uniform.
circular_t Time is circular (see 3.4).
invalidPeriod Chosen period to test is not valid.
invalidJTKperiod Chosen period to test is not valid for JTK Cycle.
Screenshot of the 'Oscillation Detection (Preview)' section of the DiscoRhythm interface.

Figure 9: Screenshot of the ‘Oscillation Detection (Preview)’ section of the DiscoRhythm interface

4.6.2 Visualizing Results

Once rhythmicity computation is completed, 3 sections become available for viewing the results:

Individual Models: Allows inspection of the raw data for individual rows of the dataset. User may click a row of the table to display the raw data and a fitted curve for that row. If the Cosinor method is being viewed, the line will be the Cosinor fit, all other methods utilize a loess fit. If the error bar option is selected, a 95% confidence interval on the mean will be displayed for each timepoint.

Summary: Summarizes calculated rhythm parameters across all tested rows by all executed methods.

Method Comparison: Offers pairwise comparison of rhythmic parameters calculated by each method to determine the degree of agreement between methods.

Screenshot of the 'Oscillation Detection' section of the DiscoRhythm interface.

Figure 10: Screenshot of the ‘Oscillation Detection’ section of the DiscoRhythm interface

4.7 Session Archiving

Purpose: Archive the results of the DiscoRhythm session in a reproducible report and download R data associated with the session.

HTML Report R code is provided in this report to reproduce all results. The data can be reprocessed in the future using this code.

Session Results Alternatively, data may be downloaded directly through the app and accessed from the disco_ro object in R.

Session Inputs Users may also simply download an R data file containing their input dataset and all parameters such that results can be reprocessed using the DiscoRhythm R package.

Screenshot of the 'Session Archiving' section of the DiscoRhythm interface.

Figure 11: Screenshot of the ‘Session Archiving’ section of the DiscoRhythm interface

5 DiscoRhythm R Usage

This section will detail usage of the R functions used to perform the analysis in section 4. For each of the sections below, refer to the DiscoRhythm R package manual for specific technical details on usage, arguments and methods or use ? to access individual manual pages. For instance, get more help for the function discoBatch() with command ?discoBatch.

5.1 Data Import

SummarizedExperiment objects such as those generated by other Bioconductor packages should be suitable inputs to DiscoRhythm functions once modified to contain the following required data:

  1. rownames(se) - The feature IDs.
  2. assay(se) - A matrix containing experimental data.
  3. colData(se) - Stores sample metadata (See 3.3.1). 3 columns are required:

    • ID
    • Time
    • ReplicateID

Objects with this structure will be used throughout usage of the package.

Note that at present DiscoRhythm will use the first assay (i.e. assays(se)[[1]]) of the SummarizedExperiment and all others will be ignored.

5.1.1 discoDFtoSE

The CSV inputs to the DiscoRhythm web interface described in section 3 can be read into R as a data.frame. To allow for users using the web application to use the same input for analysis in R, the function discoDFtoSE is available to convert the tabular input into an apporpriate format for analysis in R described in 5.1 above.

The sample metadata will be extracted from the column names by matching the format6 See ?discoParseMeta for regular expression specifications. described in section 3.3. The checks for validity and uniqueness mentioned in section 4.2 will also be performed. Alternatively, this metadata may also be input directly to discoDFtoSE as a data.frame.

Loading in the same example dataset as section 3 using discoGetSimu() to read in the CSV system file as a data.frame:

library(DiscoRhythm)
indata <- discoGetSimu()
knitr::kable(head(indata[,1:6]), format = "markdown") # Inspect the data
IDs CT0_1_A CT0_2_B CT0_3_C CT4_4_D CT4_5_E
nonRhythmGene_Id1 6.3717658 4.0909466 8.3010577 7.0240708 -4.6510406
nonRhythmGene_Id2 -2.4340798 -5.6695362 -3.0800470 -0.1800393 3.1282445
nonRhythmGene_Id3 -9.8373929 -5.8738313 -0.7793628 -2.7236067 2.6574806
nonRhythmGene_Id4 4.5951368 0.2952068 3.5660223 -1.2458999 6.1385621
nonRhythmGene_Id5 0.3980836 3.9928686 1.5626314 0.0613089 0.7013376
nonRhythmGene_Id6 -0.2139567 -2.4819291 -6.5602675 -1.9901769 -7.5526251

And importing as a SummarizedExperiment.

se <- discoDFtoSE(indata)

5.1.2 discoSEtoDF

The reverse operation, discoSEtoDF, is also available and is mainly intended for the purpose of exporting data as CSV for input to the web application.

write.csv(discoSEtoDF(se),file = "DiscoRhythmInputFile.csv",row.names = FALSE)

Will export data for usage in the web application.

5.1.3 discoCheckInput

This function performs the row-wise checks for missing values and constant values as mentioned in section 4.2.

selectDataSE <- discoCheckInput(se)

5.1.4 discoDesignSummary

The sample collection information present in colData(selectDataSE) can be summarized by the discoDesignSummary function to detail the number of biological and technical replicates available at each collection time. Number of technical replicates is shown in brackets.

library(SummarizedExperiment)
Metadata <- colData(selectDataSE)
knitr::kable(discoDesignSummary(Metadata),format = "markdown")
0 4 8 12 16 20
Total 9 9 9 9 9 9
Biological Sample A (3) Biological Sample D (3) Biological Sample G (3) Biological Sample J (3) Biological Sample M (3) Biological Sample P (3)
Biological Sample B (3) Biological Sample E (3) Biological Sample H (3) Biological Sample K (3) Biological Sample N (3) Biological Sample Q (3)
Biological Sample C (3) Biological Sample F (3) Biological Sample I (3) Biological Sample L (3) Biological Sample O (3) Biological Sample R (3)

5.2 Outlier Detection

5.2.1 discoInterCorOutliers

Performs the analysis described in 4.3.1 to return some intermediate results and a vector indicating which samples were determined to be outliers.

CorRes <- discoInterCorOutliers(selectDataSE,
                                cor_method="pearson",
                                threshold=3,
                                thresh_type="sd")

5.2.2 discoPCAoutliers

Performs the analysis described in 4.3.2 to return some intermediate results and a vector indicating which samples were determined to be outliers.

PCAres <- discoPCAoutliers(selectDataSE,
                           threshold=3,
                           scale=TRUE,
                           pcToCut = c("PC1","PC2","PC3","PC4"))

5.2.3 discoPCA

A light wrapper was written for the stats::prcomp function for better use with the web application and it can be utilized as:

discoPCAres <- discoPCA(selectDataSE)

This returns the same output as prcomp with the addition of a reformatted summary table (available as PCAresAfter$table).

5.2.4 Filtering Summary

Below the results of the outlier detection analysis (CorRes and PCAres) are used to subset the data to remove outliers:

FilteredSE <- selectDataSE[,!PCAres$outliers & !CorRes$outliers]

DT::datatable(as.data.frame(
  colData(selectDataSE)[PCAres$outliers | CorRes$outliers,]
))
knitr::kable(discoDesignSummary(colData(FilteredSE)),format = "markdown")
0 4 8 12 16 20
Total 9 8 9 8 9 9
Biological Sample A (3) Biological Sample D (3) Biological Sample G (3) Biological Sample J (3) Biological Sample M (3) Biological Sample P (3)
Biological Sample B (3) Biological Sample E (3) Biological Sample H (3) Biological Sample K (3) Biological Sample N (3) Biological Sample Q (3)
Biological Sample C (3) Biological Sample F (2) Biological Sample I (3) Biological Sample L (2) Biological Sample O (3) Biological Sample R (3)

5.3 Row Selection

5.3.1 discoRepAnalysis

Performs the analysis described in 4.4 returning the results of the ANOVA test and the se data object where technical replicates are combined.

ANOVAres <- discoRepAnalysis(FilteredSE,
                             aov_method="Equal Variance",
                             aov_pcut=0.05,
                             aov_Fcut=1,
                             avg_method="Median")

FinalSE <- ANOVAres$se

5.4 Dominant Rhythmicities

5.4.1 discoPeriodDetection

Performs the analysis described in 4.5 to return a data.frame of Cosinor fits across a range of periods.

PeriodRes <- discoPeriodDetection(FinalSE,
                                  timeType="linear",
                                  main_per=24)

The main period of interest is fit using a Cosinor model to principal component scores as described in 4.5.

OVpca <- discoPCA(FinalSE)
OVpcaSE <- discoDFtoSE(data.frame("PC"=1:ncol(OVpca$x),t(OVpca$x)),
                                  colData(FinalSE))
knitr::kable(discoODAs(OVpcaSE,period = 24,method = "CS")$CS,
             format = "markdown")
acrophase amplitude Rsq pvalue intercept sincoef coscoef qvalue
17.71281 17.6386786 0.9855849 0.0000000 0 -8.7944228 -0.6624756 0.0000000
23.68809 10.9255889 0.9182224 0.0000000 0 -0.4455825 5.4445918 0.0000000
14.95058 0.4453926 0.0024716 0.9816116 0 -0.1554194 -0.1594943 0.9973953
10.36796 0.2569763 0.0008546 0.9936085 0 0.0532436 -0.1169372 0.9973953
12.87527 0.7831344 0.0087132 0.9364723 0 -0.0889421 -0.3813321 0.9973953
15.09117 0.5088081 0.0041057 0.9696152 0 -0.1841327 -0.1755465 0.9973953
23.56921 0.9746275 0.0160074 0.8860104 0 -0.0548425 0.4842179 0.9973953
23.14554 0.4729064 0.0040864 0.9697559 0 -0.0524537 0.2305618 0.9973953
14.21972 0.6315381 0.0078395 0.9426803 0 -0.1733449 -0.2639350 0.9973953
20.71661 0.1314298 0.0003477 0.9973953 0 -0.0497840 0.0428952 0.9973953

5.5 Oscillation Detection

5.5.1 discoODAs

Performs the analysis described in 4.6.1 using just the Cosinor method. discoODAs will automatically run all appropraite methods if none are provided.

discoODAres <- discoODAs(FinalSE,
                         period=24,
                         method="CS",
                         ncores=1,
                         circular_t=FALSE)

5.6 Batch Execution

5.6.1 discoBatch

The entire analysis performed in section 5 may be run through a single call to discoBatch() to obtain the final discoODAres results.

discoBatch(indata=indata,
  report="discoBatch_example.html",
  ncores=1,
  main_per=24,
  timeType="linear",
  cor_threshold=3,
  cor_method="pearson",
  cor_threshType="sd",
  pca_threshold=3,
  pca_scale=TRUE,
  pca_pcToCut=paste0("PC",seq_len(4)),
  aov_method="None",
  aov_pcut=0.05,
  aov_Fcut=0,
  avg_method="Median",
  osc_method="CS",
  osc_period=24)

This command will generate an html report called “discoBatch_example.html” which includes the visualizations seen in the DiscoRhythm application. indata may be in either of the two input formats described in 5.1 (data.frame or SummarizedExperiment).

6 Session Info

sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.9-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.9-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] SummarizedExperiment_1.14.0 DelayedArray_0.10.0        
##  [3] BiocParallel_1.18.0         matrixStats_0.54.0         
##  [5] Biobase_2.44.0              GenomicRanges_1.36.0       
##  [7] GenomeInfoDb_1.20.0         IRanges_2.18.0             
##  [9] S4Vectors_0.22.0            BiocGenerics_0.30.0        
## [11] DiscoRhythm_1.0.0           BiocStyle_2.12.0           
## 
## loaded via a namespace (and not attached):
##   [1] colorspace_1.4-1       class_7.3-15           modeltools_0.2-22     
##   [4] mclust_5.4.3           futile.logger_1.4.3    XVector_0.24.0        
##   [7] rstudioapi_0.10        flexmix_2.3-15         DT_0.5                
##  [10] mvtnorm_1.0-10         xml2_1.2.0             codetools_0.2-16      
##  [13] robustbase_0.93-4      knitr_1.22             jsonlite_1.6          
##  [16] broom_0.5.2            cluster_2.0.9          kernlab_0.9-27        
##  [19] shinydashboard_0.7.1   shiny_1.3.2            readr_1.3.1           
##  [22] BiocManager_1.30.4     compiler_3.6.0         httr_1.4.0            
##  [25] backports_1.1.4        assertthat_0.2.1       Matrix_1.2-17         
##  [28] lazyeval_0.2.2         later_0.8.0            formatR_1.6           
##  [31] htmltools_0.3.6        tools_3.6.0            gtable_0.3.0          
##  [34] glue_1.3.1             GenomeInfoDbData_1.2.1 reshape2_1.4.3        
##  [37] dplyr_0.8.0.1          Rcpp_1.0.1             trimcluster_0.1-2.1   
##  [40] gdata_2.18.0           nlme_3.1-139           crosstalk_1.0.0       
##  [43] iterators_1.0.10       fpc_2.1-11.2           xfun_0.6              
##  [46] stringr_1.4.0          rvest_0.3.3            miniUI_0.1.1.1        
##  [49] mime_0.6               shinycssloaders_0.2.0  gtools_3.8.1          
##  [52] dendextend_1.10.0      DEoptimR_1.0-8         MASS_7.3-51.4         
##  [55] zlibbioc_1.30.0        scales_1.0.0           TSP_1.1-6             
##  [58] shinyBS_0.61           hms_0.4.2              promises_1.0.1        
##  [61] lambda.r_1.2.3         RColorBrewer_1.1-2     yaml_2.2.0            
##  [64] heatmaply_0.15.2       gridExtra_2.3          ggplot2_3.1.1         
##  [67] UpSetR_1.3.3           ggExtra_0.8            stringi_1.4.3         
##  [70] highr_0.8              gclus_1.3.2            foreach_1.4.4         
##  [73] seriation_1.2-3        caTools_1.17.1.2       rlang_0.3.4           
##  [76] pkgconfig_2.0.2        prabclus_2.2-7         bitops_1.0-6          
##  [79] evaluate_0.13          lattice_0.20-38        purrr_0.3.2           
##  [82] htmlwidgets_1.3        tidyselect_0.2.5       plyr_1.8.4            
##  [85] magrittr_1.5           bookdown_0.9           R6_2.4.0              
##  [88] magick_2.0             gplots_3.0.1.1         generics_0.0.2        
##  [91] pillar_1.3.1           whisker_0.3-2          RCurl_1.95-4.12       
##  [94] nnet_7.3-12            tibble_2.1.1           crayon_1.3.4          
##  [97] futile.options_1.0.1   KernSmooth_2.23-15     plotly_4.9.0          
## [100] rmarkdown_1.12         viridis_0.5.1          grid_3.6.0            
## [103] data.table_1.12.2      matrixTests_0.1.2      digest_0.6.18         
## [106] diptest_0.75-7         webshot_0.5.1          xtable_1.8-4          
## [109] VennDiagram_1.6.20     tidyr_0.8.3            httpuv_1.5.1          
## [112] munsell_0.5.0          kableExtra_1.1.0       registry_0.5-1        
## [115] viridisLite_0.3.0      shinyjs_1.0