The current implementation for the
@treatmentResponse slot in a
PharmacoSet has some limitations.
Firstly, it does not natively support dose-response experiments with
multiple drugs and/or cancer cell lines. As a result we have not been
able to include this data into a
PharmacoSet thus far.
Secondly, drug combination data has the potential to scale to high dimensionality. As a result we need an object that is highly performant to ensure computations on such data can be completed in a timely manner.
To resolve these issues, we designed and implement the
TRE for short)!
The current use case is supporting drug combinations experiments in
PharmacoGx, but we wanted to create something flexible
enough to fit other use cases. As such, we have used the generic term
‘treatment’ to refer to any experimental intervention one can conduct on a
set of samples. In context of PharmacoGx, a
treatment represents application of one or more anti-cancer compounds to a
cancer cell-line. The resulting viability for this cell-line
after treatment is the response metric. We hope that the implementation of
our class is general enough to support other use cases. For example, the
TreatmentResponseExperiment class is also being adopted for radiation
dose-response experiments in cancer cell-lines in
RadioGx as well as for investigating compound
toxicity in healthy human and rat cell-lines in
Our design takes the aspects of the
MultiAssayExperiment classes and implements them using the
package, which provides an R API to a rich set of tools for scalable,
high performance data processing implemented in C.
We have borrowed directly from the
assays slot names.
We also implemented the
SummarizedExperiment accessor methods for the
TreatmentResponseExperiment. Therefore the interface should be familiar to
users of common Bioconductor packages.
There are, however, some important differences which make this object more flexible when dealing with high dimensional data.
SummarizedExperiment, there are three distinct
subgroups of columns in
The first are the
colKey which are implemented internally to
map between each assay observation and its associated treatments or samples
(rows or columns); these will not be returned by the accessors by default.
The second are the
colIDs, these hold all of the information
necessary to uniquely identify a row or column and are used to generate the
colKey. Finally, there are the
which store any additional data about treatments or samples not required to
uniquely identify a row in either table.
assayIndex is stored in the
@.intern slot which maps between unique combinations of
and the experimental observations in each assay. This relationship is maintained
using a separate primary key for each assay, which can map to one or more
colKey combination. For assays containing raw experimental observations,
generally each assay row will map to one and only one combination of
colKey. However, for metrics computed over experimental observations, It
may be desirable to summarized over some of the
In this case, the relationship between the summarized rows and the metadata
stored in the
colData slots are retained in the
Also worth noting is the cardinality between
colData for a given
assay within the assays list. As indicated by the lower connection between these
tables and an assay, for each row or column key there may be zero or more rows in
the assay table. Conversely for each row in the assay there may be zero or one key
rowData. When combined, the
colKey for a given
row in an assay become a composite key which uniquely identify an observation.
To deal with the complex kinds of experimental designs which can be stored
LongTable, we have engineered a new object to help document and validate
the way data is mapped from raw data files, as a single large
data.table, to the various slots of a
DataMapper is an abstract class, which means in cannot be instatiated.
Its purpose is to provide a description of the concept of a DataMapper and
define a basic interface for any classes inheriting from it. A DataMapper is
simply a way to map columns from some raw data file to the slots of an S4 class.
It is similar to a schema in SQL in that it defines the valid parts of an
object (analogously a SQL table), but differs in that no types are specified or
enforced at this time.
This object is not important for general users, but may be useful for other
developers who want to map from some raw data to some
S4 class. In this case,
any derived data mapper should inherit from the
DataMapper abstract class.
Only one slot is defined by default, a
List in the
An accessor method,
rawdata(DataMapper), is defined to assign and retrieve
the raw data from your mapper object.
TREDataMapper class is the first concrete sub-class of a
DataMapper. It is the object which defines how to go from a single
data.table of raw experimental data to a properly formatted
TreatmentResponseExperiment object. This is accomplished by defining
various mappings, which let the the user decide which columns from
should go into which slots of the object. Each slot mapping is implemented as a
list of character vectors specifying the column names from
rawdata to assign
to each slot.
Additionally, a helper method has been included,
guessMapping, that will
try to determine which columns of a
should be assigned to which slots, and therefore which maps.
To get started making a
TreatmentResponseExperiment lets have a look at some
rawdata which is a subset of the data from Oneil et al., 2016. The full set
of rawdata is available for exploration and download from
SynergxDB.ca, a free and open source web-app and
database of publicly available drug combination sensitivity experiments which we
created and released (Seo et al., 2019).
The data was generated as part of the commercial activities of the pharmaceutical company Merck, and is thus named according.