randLassoStabSel {monaLisa}R Documentation

Randomized Lasso Stability Selection

Description

This function runs randomized lasso stability selection as presented by Meinshausen and Bühlmann (2010) and with the improved error bounds introduced by Shah and Samworth (2013). The function uses the stabsel function from the stabs package, but implements the randomized lasso version.

Usage

randLassoStabSel(
  x,
  y,
  weakness = 0.8,
  cutoff = 0.8,
  PFER = 2,
  mc.cores = 1L,
  ...
)

Arguments

x

the predictor matrix.

y

the response vector.

weakness

value between 0 and 1 (default = 0.8). It affects how strict the method will be in selecting predictors. The closer it is to 0, the more stringent the selection. A weakness value of 1 is identical to performing lasso stability selection (not the randomized version).

cutoff

value between 0 and 1 (default = 0.8) which is the cutoff for the selection probability. Any variable with a selection probability that is higher than the set cutoff will be selected.

PFER

integer (default = 2) representing the absolute number of false positives that we allow for in the final list of selected variables. For details see Meinshausen and Bühlmann (2010).

mc.cores

integer (default = 1) specifying the number of cores to use in mclapply, which is the default way stabsel does parallelization.

...

additional parameters that can be passed on to stabsel.

Details

Randomized lasso stability selection runs a randomized lasso regression several times on subsamples of the response variable and predictor matrix. N/2 elements from the response variable are randomly chosen in each regression, where N is the length of the vector. The corresponding section of the predictor matrix is also chosen, and the internal .glmnetRandomizedLasso function is applied. Stability selection results in selection probabilities for each predictor. The probability of a specific predictor is the number of times it was selected divided by the total number of subsamples that were done (total number of times the regression was performed).

We made use of the stabs package that implements lasso stability selection, and adapted it to run randomized lasso stability selection.

Value

A SummarizedExperiment object where the rows are the observations and the columns the predictors (same dimnames as the predictor matrix x). It contains:

References

N. Meinshausen and P. Bühlmann (2010), Stability Selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72, 417–73.
R.D. Shah and R.J. Samworth (2013), Variable Selection with Error Control: Another Look at Stability Selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75, 55–80.
B. Hofner, L. Boccuto, and M. Göker (2015), Controlling False Discoveries in High-Dimensional Situations: Boosting with Stability Selection, BMC Bioinformatics, 16 144.

See Also

stabsel

Examples

## create data set
Y <- rnorm(n = 500, mean = 2, sd = 1)
X <- matrix(data = NA, nrow = length(Y), ncol = 50)
for (i in seq_len(ncol(X))) {
  X[ ,i] <- runif(n = 500, min = 0, max = 3)
}
s_cols <- sample(x = seq_len(ncol(X)), size = 10, 
  replace = FALSE)
for (i in seq_along(s_cols)) {
  X[ ,s_cols[i]] <- X[ ,s_cols[i]] + Y
}
  
## reproducible randLassoStabSel() with 1 core
set.seed(123)
ss <- randLassoStabSel(x = X, y = Y)
  
## reproducible randLassoStabSel() in parallel mode 
## (only works on non-windows machines)
if (.Platform$OS.type == "unix") {
    RNGkind("L'Ecuyer-CMRG")
    set.seed(123)
    ss <- randLassoStabSel(x = X, y = Y, mc.preschedule = TRUE, 
                           mc.set.seed = TRUE, mc.cores = 2L)
}


[Package monaLisa version 0.99.5 Index]