1 Introduction

Batch effects can have a major impact on the results of omics studies (Leek et al. 2010). Randomization is the first, and arguably most influential, step in handling them. However, its implementation suffers from a few key issues:

  • A single,ATTEMPT314 random draw can inadvertently result in high correlation between technical covariates and biological factors. Particularly in studies with large numbers of batches and outcomes of interest, minimizing these correlations is crucial.
  • Long, randomized sample lists are unintuitive and translate poorly into any wet lab that is not fully automated. This can result in errors and sample mixups.
  • The randomization process is inherently unclear in many publications, rarely described despite the varying efficacy of methods.
  • Randomized layouts are not always reproducible, resulting in inconsistent results.

To combat these problems, we developed Omixer - an R package for multivariate randomization and reproducible generation of intuitive sample layouts.

1.1 Dependencies

This document has the following dependencies.

library(Omixer)
library(tibble)
library(forcats)
library(stringr)
library(dplyr)
library(ggplot2)
library(magick)

1.2 Workflow

Omixer randomizes input sample lists multiple times (default: 1,000) and then combines these randomized lists with plate layouts, which can be selected from commonly used setups or custom-made. It can handle paired samples, keeping these adjacent but shuffling their order, and allows explicit masking of specific wells if, for example, plates are shared between different studies.

After performing robust tests of correlation between technical covariates and selected randomization factors, a layout is chosen using these criteria:

  • No test provided sufficient evidence to suggest correlation between the variables (all p-values over 0.05).
  • From the remaining layouts, return one where the absolute sum of correlations is minimized.

The optimal randomized list can then be processed by omixerSheet, returning intuitive sample layouts for the wet lab.

2 Creating Layouts

In order to establish correlations between technical covariates and biological factors, Omixer needs to know the plate layout that your samples will be randomized to. There are several options for automatically creating some common layouts. Alternatively, a data frame can be input to the layout option alongside specified techVars. Possibilities are discussed in more detail below.

2.1 Automated Layouts

Several options can be used to automatically generate common layouts:

  • wells specifies the number of wells on a plate, which can be 96, 48, or 24.
  • plateNum determines how many copies of the plate your samples will need.
  • div is optional, and subdivides the plate into batches. This can be used to specify chips within a plate, for example.
  • positional allows positions within div to also be treated as batches. This is useful for 450K experiments where positional batch effects have been identified (Jiao et al. 2018).

2.2 Subdivisions

By default, div is set to “none”, but it can be set to “col”, “row”, “col-block”, or “row-block”.

  • col treats each column in the plate as a batch
  • row treats each row in the plate as a batch
  • col-block will separate the plate into batches that are 2 columns wide
  • row-block separates the plate into 2 row wide batches

So, for wells=48, div="col", each column of a 48-well plate will be treated as a batch (different colours in the image below).