This vignette demonstrates how to use the pls function
to explore associations between two blocks of variables using Partial
Least Squares analysis.
pls_result = pls(X, Y, perm=999, global_RV_test=TRUE)
# Perform PLS analysis with permutation test
# pls_result contains scores, singular axes, and significance tests
pls_result$global_significance_RV## observed_RV RV_p_value
## 0.3517908 0.5630000
## correlation_PLS_scores pvalue_correlation_PLS_scores singular_values
## 1 0.9224254 0.024 1.6697807
## 2 0.4973531 0.975 0.8772501
## 3 0.8229346 0.095 0.6960334
## 4 0.5245563 0.808 0.3283668
## 5 0.4226646 0.651 0.2153536
## 6 0.3028506 0.198 0.1307253
## pvalue_singular_values percentage_squared_covariance
## 1 0.466 66.1724231
## 2 0.737 18.2643823
## 3 0.274 11.4978934
## 4 0.793 2.5590368
## 5 0.583 1.1006833
## 6 0.156 0.4055811
The output includes scores for each block, singular axes, and permutation-based significance tests for the association between blocks. Significant results indicate association between the two sets of variables. Obviously, in this case, we expect no significant association as the data were simulated independently.
In this section, we simulate data with a known association between two blocks by constructing a covariance matrix with non-zero covariances between blocks, then splitting the data into two blocks for PLS analysis.
set.seed(1234)
n = 50 # number of observations
p = 6 # variables per block
# Simulate a latent variable that drives covariation
latent = rnorm(n)
# latent variable
latent_mat = matrix(latent, nrow=n, ncol=p)
# latent variable replicated across columns for element-wise addition
X_assoc = matrix(rnorm(n*p), ncol=p) + 0.5 * latent_mat
# Block 1: shape variables influenced by latent + noise
Y_assoc = matrix(rnorm(n*p), ncol=p) + 0.4 * latent_mat
# Block 2: ecological variables influenced by latent + noise
# Now X_assoc and Y_assoc have shared structure and will show significant covariationpls_assoc_result = pls(X_assoc, Y_assoc, perm=999, global_RV_test=TRUE)
# Perform PLS analysis with permutation test
pls_assoc_result$global_significance_RV # Show global RV test result## observed_RV RV_p_value
## 0.2116501 0.0020000
## correlation_PLS_scores pvalue_correlation_PLS_scores singular_values
## 1 0.60055147 0.029 1.28679007
## 2 0.49643684 0.067 0.64988017
## 3 0.32948130 0.383 0.41189941
## 4 0.28365981 0.174 0.27593509
## 5 0.18058368 0.182 0.17274065
## 6 0.04361811 0.427 0.02841955
## pvalue_singular_values percentage_squared_covariance
## 1 0.001 70.32250776
## 2 0.081 17.93682258
## 3 0.216 7.20545296
## 4 0.244 3.23364840
## 5 0.185 1.26726680
## 6 0.582 0.03430151
In this case, the simulated data have a known association between blocks. The PLS analysis should detect significant covariation, as reflected in the permutation test results and the singular axis significance output.
pls_major_axis for Major Axis Projection and
PredictionsSometimes, after fitting a PLS model it is useful to project the
scores onto the major axis (first principal component of the paired PLS
scores) to obtain simplified representations and predictions in the
original variable space. In a sense, these projections are the real
“predictions” of a PLS model (if all variation is perfectly captured by
the model). For more details, please see Fig. 2 in Fruciano et
al. 2020-Current Zoology and associated text. The function
pls_major_axis performs these operations.
Pred_major_axis = pls_major_axis(pls_assoc_result, axes_to_use=1)
# Compute projections and predictions based on the first pair of PLS axes
names(Pred_major_axis)## [1] "original_major_axis_projection"
## [2] "original_major_axis_predictions_reversed"
# Main elements returned (lists for projection, predictions, and optionally new data)
Proj_scores = Pred_major_axis$original_major_axis_projection[[1]]$original_data_PLS_projection
# Scores of original data projected on the major axis
hist(Proj_scores,
main="Original data - projections on the major axis",
xlab="Major axis score")# Distribution of major axis scores (first axis pair)
Pred_block1 = Pred_major_axis$original_major_axis_predictions_reversed$Block1
# Predictions for block 1 back-transformed to original space
Pred_block2 = Pred_major_axis$original_major_axis_predictions_reversed$Block2
# Predictions for block 2 back-transformed to original spaceWe now create new data (here, a perturbed version of the original associated data) and obtain projections and predictions for it using the existing PLS model.
set.seed(999)
X_new = X_assoc + matrix(rnorm(n * ncol(X_assoc), sd=0.2), ncol=ncol(X_assoc))
# New data for block 1 (perturbed version of original data)
Y_new = Y_assoc + matrix(rnorm(n * ncol(Y_assoc), sd=0.2), ncol=ncol(Y_assoc))
# New data for block 2 (perturbed version of original data)
Pred_major_axis_new = pls_major_axis(pls_assoc_result,
new_data_x = X_new,
new_data_y = Y_new,
axes_to_use = 1)
# Obtain major axis projections and predictions for new data
colnames(Pred_major_axis_new$new_data_results$new_data_major_axis_proj)## NULL
# Names (axes) for the major axis projections of new data
head(Pred_major_axis_new$new_data_results$new_data_major_axis_proj)## [,1]
## [1,] -1.9975730
## [2,] 0.9804839
## [3,] 1.5819376
## [4,] -2.4474063
## [5,] 0.9048941
## [6,] 1.2771539
# Scores of new data on the major axis
head(Pred_major_axis_new$new_data_results$new_data_Block1_proj_prediction_revert)## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -1.1629221 -0.9112237 -1.0816728 -1.0403921 -1.1113456 -0.64515250
## [2,] 0.4411075 0.1393327 0.2864011 0.4869834 0.3547199 -0.01070961
## [3,] 0.7650602 0.3515049 0.5626998 0.7954548 0.6508090 0.11742360
## [4,] -1.4052096 -1.0699095 -1.2883194 -1.2711010 -1.3327937 -0.74098463
## [5,] 0.4003936 0.1126672 0.2516763 0.4482151 0.3175078 -0.02681320
## [6,] 0.6008988 0.2439877 0.4226868 0.6391384 0.5007673 0.05249271
# Predictions for block 1 new data (back-transformed)
head(Pred_major_axis_new$new_data_results$new_data_Block2_proj_prediction_revert)## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -1.4256076 -1.09357864 -0.8691363 -0.9563543 -1.194361073 -0.38736268
## [2,] 0.3950640 0.09327271 0.2648004 0.2120555 -0.104869113 -0.08435270
## [3,] 0.7627701 0.33297131 0.4938123 0.4480297 0.115166611 -0.02315627
## [4,] -1.7006187 -1.27285167 -1.0404167 -1.1328418 -1.358928037 -0.43313211
## [5,] 0.3488513 0.06314774 0.2360186 0.1823987 -0.132522882 -0.09204378
## [6,] 0.5764368 0.21150520 0.3777617 0.3284509 0.003664568 -0.05416727
axes_to_use > 1, the function computes a major
axis for each pair of PLS axes and combines predictions.scale_PLS = FALSE disables scaling of block
scores prior to major axis computation (default is
TRUE).