Package loading

library(car)
library(pander)
library(gridExtra)
library(ggplot2)

1 Installation and loading of the limpca package

limpca can be installed from Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("limpca")

And then loaded into your R session:

library("limpca")

2 Data and model presentation

This data set comes from the study of the modulation of immunity in rainbow trout (Oncorhynchus mykiss) by exposure to cadmium (Cd) combined with polyunsaturated fatty acids (PUFAs) enriched diets [Cornet et al., 2018].

The responses were quantified by measuring the modification of the expression of 15 immune-related genes (m = 15) by RT-qPCR (reverse transcription quantitative polymerase chain reaction). The experiment was carried out on 72 trouts and 3 factors were considered in the experimental design:

  • Day: Measurements on trouts were collected on days 28, 70 and 72
  • Treatment : Four polyunsaturated fatty acid diets: alpha-linolenic acid (ALA), linoleic acid (LA), eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA)
  • Exposure: Trouts were exposed (level = 2) or not (level = 0) to high cadmium concentrations.

This gives a 3 × 4 × 2 factorial design. Each of the 24 trials corresponds to a different aquarium. Three fishes were analysed (3 replicates) for each condition, giving a total of 72 observations.

In this limpca vignette, the data are first explored in order to prepare an appropriate data set for ASCA/APCA analysis. Data are first represented by PCA and outliers removed. The remaining observations are then log transformed. Next, the data of each aquarium are mean aggregated in order to avoid the inclusion of an aquarium random factor in the statistical model because limpca is not yet able to handle mixed linear models. The data are finally centered and scaled by column.

The estimated model in then a (general) linear model for fixed factors including main effects and all two way interactions. The three way interaction is not included because the aggregated design has no replicate.

A detailed presentation and analysis of this dataset is also available in [Benaiche, 2022].

3 Data import and exploration

3.1 Data import and design visualization

The trout data set is list of three objects : the model outcomes, the design and the model formula.

data("trout")
# print number of and response names
cat("\n Nb of Responses :  ", ncol(trout$outcomes), "\n ")

 Nb of Responses :   15 
 
cat("\nResponses :\n", colnames(trout$outcomes), "\n ")

Responses :
 IL.1b IL6 IL8 Lysozyme IgM MCSFR.a MPO TGF.b TLR3 TLR9 MyD88 SOD Elov5 C3 Cox2 
 
# Order responses by alphabetic order
trout$outcomes <- trout$outcomes[, order(dimnames(trout$outcomes)[[2]])]
cat("\n Ordered responses :\n  ", colnames(trout$outcomes), "\n ")

 Ordered responses :
   C3 Cox2 Elov5 IgM IL.1b IL6 IL8 Lysozyme MCSFR.a MPO MyD88 SOD TGF.b TLR3 TLR9 
 
# print factor names
cat("\nDesign factors :  ", colnames(trout$design), "\n ")

Design factors :   Day Treatment Exposure Aquarium 
 
# plot the design with plotDesign function
limpca::plotDesign(
    design = trout$design, x = "Treatment",
    y = "Day", cols = "Exposure",
    title = "Initial design of the trout dataset"
)

This graph confirms that the design is balanced.

3.2 Principal Component Analysis of row data

Functions pcaBySvd, pcaScreePlot and pcaScorePlot are used to take a first look at the data by PCA.

This PCA shows that the data should be log transformed.

resPCA <- limpca::pcaBySvd(trout$outcomes)
limpca::pcaScreePlot(resPCA, nPC = 8)

limpca::pcaScorePlot(
    resPcaBySvd = resPCA, axes = c(1, 2),
    title = "Score plot of original data ",
    design = trout$design, color = "Aquarium",
    points_labs_rn = TRUE
)
Warning: ggrepel: 64 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

3.3 Log10 transformation of the data and new PCA

Data are log transfomed and a new PCA is applied.

The score plot show clearly two outliers (i.e. fishes D72EPA2.1 and D28EPA2.2). They will removed of the analysis for the next steps.

# Log Transformation
trout_log <- trout
trout_log$outcomes <- as.matrix(log10(trout$outcomes))

# new PCA
resPCA1 <- limpca::pcaBySvd(trout_log$outcomes)
limpca::pcaScreePlot(resPCA1, nPC = 8)

limpca::pcaScorePlot(
    resPcaBySvd = resPCA1, axes = c(1, 2),
    title = "Score plot of Log10 data ",
    design = trout_log$design, color = "Aquarium",
    drawShapes = "polygon", points_labs_rn = TRUE
)
Warning: ggrepel: 71 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

3.4 New PCA without the outliers

New data are created without the 2 outliers and a new PCA performed. The option polygon of pcaScorePlot function allows to group data by aquarium. In some aquariums, the 3 fishes shows very similar results and in other not.

The data are now clean for further analysis

# Remove outliers and create new dataset
trout_clean <- trout_log
outliers <- match(
    c("D72EPA2.1", "D28EPA2.2"),
    rownames(trout_log$outcomes)
)
trout_clean$outcomes <- trout_log$outcomes[-outliers, ]
trout_clean$design <- trout_log$design[-outliers, ]
# PCA
resPCA2 <- limpca::pcaBySvd(trout_clean$outcomes)
limpca::pcaScreePlot(resPCA2, nPC = 8)