`limpca`

on the UCH metabolomics dataset.limpca 1.0.0

Package loading

```
library(pander)
library(gridExtra)
library(ggplot2)
library(SummarizedExperiment)
```

The model used in this example is a three-way ANOVA with fixed effects. This document presents all the usual steps of the analysis, from importing the data to visualising the results. Details on the methods used and the package implementation can be found in the articles of Thiel, Féraud, and Govaerts (2017), Guisset, Martin, and Govaerts (2019) and Thiel et al. (2023).

`limpca`

package`limpca`

can be installed from Bioconductor:

```
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("limpca")
```

And then loaded into your R session:

`library("limpca")`

In order to use the limpca core functions, the data need to be formatted as a list (informally called an lmpDataList) with the following elements: `outcomes`

(multivariate matrix), `design`

(data.frame) and `formula`

(character string).
The `UCH`

data set is already formatted appropriately and can be loaded from `limpca`

with the `data`

function.

```
data("UCH")
str(UCH)
```

```
List of 3
$ design :'data.frame': 34 obs. of 5 variables:
..$ Hippurate: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 2 2 2 2 ...
..$ Citrate : Factor w/ 3 levels "0","2","4": 1 1 2 2 3 3 1 1 2 2 ...
..$ Dilution : Factor w/ 1 level "diluted": 1 1 1 1 1 1 1 1 1 1 ...
..$ Day : Factor w/ 2 levels "2","3": 1 1 1 1 1 1 1 1 1 1 ...
..$ Time : Factor w/ 2 levels "1","2": 1 2 1 2 1 2 1 2 1 2 ...
$ outcomes: num [1:34, 1:600] 0.0312 0.0581 0.027 0.0341 0.0406 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:34] "M2C00D2R1" "M2C00D2R2" "M2C02D2R1" "M2C02D2R2" ...
.. ..$ X1: chr [1:600] "9.9917004" "9.9753204" "9.9590624" "9.9427436" ...
$ formula : chr "outcomes ~ Hippurate + Citrate + Time + Hippurate:Citrate + Time:Hippurate + Time:Citrate + Hippurate:Citrate:Time"
```

Alternatively, the lmpDataList can be created with the function `data2LmpDataList`

:

- from scratch:

```
UCH2 <- data2LmpDataList(outcomes = UCH$outcomes, design = UCH$design,
formula = UCH$formula)
```

`| dim outcomes: 34x600`

`| formula: ~ Hippurate + Citrate + Time + Hippurate:Citrate + Time:Hippurate + Time:Citrate + Hippurate:Citrate:Time`

```
| design variables (5):
* Hippurate (factor)
* Citrate (factor)
* Dilution (factor)
* Day (factor)
* Time (factor)
```

- or from a
`SummarizedExperiment`

:

```
se <- SummarizedExperiment(assays = list(counts = t(UCH$outcomes)),
colData = UCH$design, metadata = list(formula = UCH$formula))
UCH3 <- data2LmpDataList(se, assay_name = "counts")
```

`| dim outcomes: 34x600`

`| formula: ~ Hippurate + Citrate + Time + Hippurate:Citrate + Time:Hippurate + Time:Citrate + Hippurate:Citrate:Time`

```
| design variables (5):
* Hippurate (factor)
* Citrate (factor)
* Dilution (factor)
* Day (factor)
* Time (factor)
```

`SummarizedExperiment`

is a generic data container that stores rectangular matrices of experimental results. See Morgan et al. (2023) for more information.

The UCH (Urine-Citrate-Hippurate) data set is described in Thiel, Féraud, and Govaerts (2017) and Guisset, Martin, and Govaerts (2019) and is issued form a metabolomics experiment. In this experiment, 36 samples of a pool of rat urine samples were spiked with two molecules Citrate and Hippurate according to a \(3^2\) full factorial design in the quantities of these two molecules. The spiked samples were analyzed by ^{1}H NMR at two different time after defrozing and over two different days. Two of the spectra where finally missing at the end of the experiment.

The UCH data set is a list containing 2 elements:

- an
`outcomes`

matrix with 34 observations of 600 response variables representing the spectra from the^{1}H NMR spectroscopy, - a
`design`

matrix with 34 observations and 4 explanatory variables

A `formula`

with the general linear model to be estimated will be added to this list below.

For the purpose of this example, only 3 factors of interest will be studied : concentrations of Hippurate and Citrate and Time after defrozing.

Here are the `limpca`

functions that are available for data exploration.

The design matrix contains the information about each observation for the four variables: Hippurate, Citrate, Day and Time. Only 3 of these variables are used in the model. The function `plotDesign`

is useful to visualise the design.

`pander(head(UCH$design))`

Hippurate | Citrate | Dilution | Day | Time | |
---|---|---|---|---|---|

M2C00D2R1 |
0 | 0 | diluted | 2 | 1 |

M2C00D2R2 |
0 | 0 | diluted | 2 | 2 |

M2C02D2R1 |
0 | 2 | diluted | 2 | 1 |

M2C02D2R2 |
0 | 2 | diluted | 2 | 2 |

M2C04D2R1 |
0 | 4 | diluted | 2 | 1 |

M2C04D2R2 |
0 | 4 | diluted | 2 | 2 |

```
plotDesign(design = UCH$design, x = "Hippurate", y = "Citrate",
rows = "Time", title = "Design of the UCH dataset")
```

This plot confirms that the design is a full 3x3x2 factorial design replicated twice with 2 missing values. Hence, the design is not balanced.

The 600 response (`outcomes`

) variables represent, for each observation, the intensities of the ^{1}H NMR spectra. These spectra can be visualized by the `plotLine`

function.

`plotLine`

functionHere, annotations (`cit_peaks`

and `hip_peaks`

) are added to the `ggplot`

objects in order to highlight the Hippurate (green) and Citrate (red) peaks.

```
p1 <- plotLine(Y = UCH$outcomes, title = "H-NMR spectrum", rows = c(3),
xlab = "ppm", ylab = "Intensity")
cit_peaks <- annotate("rect", xmin = c(2.509), xmax = c(2.709),
ymin = -Inf, ymax = Inf, alpha = 0.2, fill = c("tomato"))
hip_peaks <- annotate("rect", xmin = c(7.458, 3.881), xmax = c(7.935,
4.041), ymin = -Inf, ymax = Inf, alpha = 0.2, fill = c("yellowgreen"))
p1 <- plotLine(Y = UCH$outcomes, title = "H-NMR spectrum", rows = c(3),
xlab = "ppm", ylab = "Intensity")
p1 + cit_peaks + hip_peaks
```

`plotScatter`

functionThe `plotScatter`

function enables to visualize the values of two outcomes variables with different colors or markers for the values of the design factors. Here, it is used to show that the \(3^2\) factorial design can be recovered from the intensities of the Hippurate and Citrate peaks in the spectra.

```
# xy corresponds to citrate (453) and hippurate peaks (369)
plotScatter(Y = UCH$outcomes, xy = c(453, 369), design = UCH$design,
color = "Hippurate", shape = "Citrate")
```

```
# Or refering to the variables names (ppm values here)
plotScatter(Y = UCH$outcomes, xy = c("2.6092056", "3.9811536"),
design = UCH$design, color = "Hippurate", shape = "Citrate")
```