Data metrics object

0.1 About data metrics object

Researchers may wish to superimpose a subset of the full dataset onto the full dataset. If a researcher is using the package to visualize RNA-seq data, then this subset of data is often differentially expressed genes (DEGs) returned from a model. In this case, the user may wish to use the dataMetrics input parameter, which contains at least one quantitative variable returned from a model such as FDR, p-value, and log fold change.

0.2 Example: two treatments

As was shown in the article Data object, the data object called soybean_ir_sub contained 5,604 genes and two treatment groups, N and P (Lauter and Graham 2016). We can examine the structure of its corresponding dataMetrics object called soybean_ir_sub_metrics as follows:

library(bigPint)
data("soybean_ir_sub_metrics")
str(soybean_ir_sub_metrics, strict.width = "wrap")

## List of 1
## $ N_P:'data.frame': 5604 obs. of 6 variables:
## ..$ ID : chr [1:5604] "Glyma.19G168700.Wm82.a2.v1" "Glyma.13G293500.Wm82.a2.v1"
##    "Glyma.05G188700.Wm82.a2.v1" "Glyma.13G173100.Wm82.a2.v1" ...
## ..$ logFC : num [1:5604] -5.92 2.99 -3.51 -3.91 -3.51 ...
## ..$ logCPM: num [1:5604] 7.52 8.08 8.83 8.27 10.19 ...
## ..$ LR : num [1:5604] 266 171 167 157 154 ...
## ..$ PValue: num [1:5604] 9.18e-60 3.65e-39 2.73e-38 6.04e-36 2.58e-35 ...
## ..$ FDR : num [1:5604] 5.14e-56 1.02e-35 5.09e-35 8.46e-33 2.89e-32 ...

0.3 Example: three treatments

Similarly, as was shown in the data page, the data object called soybean_cn_sub contained 7,332 genes and three treatment groups, S1, S2, and S3 (Brown and Hudson 2015). We can examine the structure of its corresponding dataMetrics object called soybean_cn_sub_metrics as follows:

0.4 Data metrics object rules

As demonstrated in the two examples above, the dataMetrics object must meet the following conditions:

Be of type list
Contain a number of elements equal to the number of pairwise treatment combinations in the data object. For example, the soybean_ir_sub_metrics object contains one list element (“N_P”) and the soybean_cn_sub_metrics object contains three list elements (“S1_S2”, “S1_S3”, “S2_S3”).
Have each list element
- Be of type data.frame
- Be called in a three-part format (such as “N_P” or “S2_S3”) that matches the Perl expression ^[a-zA-Z0-9]+_[a-zA-Z0-9]+, where
  - The first part indicates the first treatment group alphameric name
  - The second part consists of an underscore “_" to serve as a delimeter
  - The third part indicates the second treatment group alphameric name
- Contain a first column called “ID” of class character consisting of the unique names of the genes
- Contain at least one column of class numeric or integer consisting of a quantitative variable. This can be called anything. In the examples above, there are five of such columns called “logFC”, “logCPM”, “LR”, “PValue”, and “FDR”.

You can quickly double-check the names of the list elements in your dataMetrics object as follows:

names(soybean_ir_sub_metrics)

## [1] "N_P"

names(soybean_cn_sub_metrics)

## [1] "S1_S2" "S1_S3" "S2_S3"

If your dataMetrics object does not fit this format, bigPint will likely throw an informative error about why your format was not recognized.

References

Brown, Anne V., and Karen A. Hudson. 2015. “Developmental Profiling of Gene Expression in Soybean Trifoliate Leaves and Cotyledons.” BMC Plant Biology 15 (1). BioMed Central:169.

Lauter, AN Moran, and MA Graham. 2016. “NCBI Sra Bioproject Accession: PRJNA318409.”