Network-regularized regression models

Introduction

This vignette covers an introduction to network-regularized generalized linear regression models (GLMs). Network-regularization makes use of graphs to model interactions of covariables and responses in generalized linear regression models and penalizes the coefficients in GLMs using this information.

Edgenet

edgenet uses a \((p \times p)\)-dimensional affinity matrix \(\mathbf{G}_X \in \mathbb{R}_+^{p \times p}\) to model the interaction of \(p\) covariables in \(\mathbf{X} \in \mathbb{R}^{n \times p}\) and analogously a matrix \(\mathbf{G}_Y \in \mathbb{R}_+^{q \times q}\) to model interactions of \(q\) response variables in \(\mathbf{Y} \in \mathbb{R}^{n \times q}\). The affinity matrices are used for regularization of regression coefficients like this:

\[\begin{equation} \small \begin{split} \hat{\mathbf{B}} = \arg \min_{\mathbf{B}} \, & -\ell(\mathbf{B}) + \lambda ||\mathbf{B} ||_1 \\ & + \frac{\psi_1}{2} \sum_{i=1}^p \sum_{j=1}^p || \boldsymbol \beta_{i,*} - \boldsymbol \beta_{j,*} ||_2^2 (\mathbf{G}_X)_{i,j} \\ & + \frac{\psi_2}{2} \sum_{i=1}^q \sum_{j=1}^q || \boldsymbol \beta_{*,i} - \boldsymbol \beta_{*,j} ||_2^2 (\mathbf{G}_Y)_{i,j} \\ = \arg \min_{\mathbf{B}} \, & -\ell(\mathbf{B}) + \lambda ||\mathbf{B} ||_1 \\ & + \psi_1 \text{tr}\left( \mathbf{B}^T \left( \mathbf{D}_{\mathbf{G}_X} - \mathbf{G}_X \right) \mathbf{B} \right) \\ & + \psi_2 \text{tr}\left( \mathbf{B} \left( \mathbf{D}_{\mathbf{G}_Y} - \mathbf{G}_Y \right) \mathbf{B}^T \right) \end{split} \label{equ:graphreg} \end{equation}\]

Matrix \(\mathbf{B} \in \mathbb{R}^{(p + 1) \times q}\) is the matrix of regression coefficients to be estimated (including an intercept). Vectors \(\boldsymbol \beta_{i, *}\) and \(\boldsymbol \beta_{*, i}\) are the \(i\)-th row or column of \(\mathbf{B}\), respectively. Shrinkage parameters \(\lambda\), \(\psi_1\) and \(\psi_2\) are fixed or need to be estimated (e.g., using cross-validation). The matrices \(\mathbf{D}_{\mathbf{G}} - \mathbf{G}\) are the combinatorial (or normalized) graph Laplacians of an affinity matrix \(\mathbf{G}\) (Chung and Graham 1997).

Group Lasso

TODO (Yuan and Lin 2006) and (Meier, Van De Geer, and Bühlmann 2008)

Families

So far netReg supports the following exponential family random variables:

Gaussian,
Binomial,
Poisson.

The log-likelihood function \(\ell(\mathbf{B})\) over \(q\) response variables is defined as:

\[\begin{equation} \small \ell(\mathbf{B}) = \sum_{j}^q \sum_i^n \log \ \mathbb{P}\left({y}_{i, j} \mid h\left(\mathbf{x}_{i,*} \cdot \beta_{*,j}\right), \phi \right) \end{equation}\]

where \(h\) is the inverse of a link function, such as the logarithm, and \(\phi\) is a dispersion parameter.

Fitting a model to data

The following section shows how to use network regularization models. We shall use edgenet, but the calls for the other methods are analogous and examples can be found in the method documentation. We first load some libraries:

library(tensorflow)
library(tfprobability)
library(netReg)

## Warning: Package 'netReg' is deprecated and will be removed from Bioconductor
##   version 3.13

## 
## Attaching package: 'netReg'

## The following objects are masked from 'package:stats':
## 
##     binomial, family, gaussian, inverse.gaussian, poisson

## The following objects are masked from 'package:base':
## 
##     beta, gamma

Set some parameters and create affinity matrices:

# parameters
n <- 100
p <- 10
q <- 10

# affinity matrices
G.X <- abs(rWishart(1, 10, diag(p))[,,1])
G.Y <- abs(rWishart(1, 10, diag(q))[,,1])

We created the affinity matrix absolutely random, although in practice a real (biological) observed affinity matrix should be used, because in the end the affinity matrix decides the shrinkage of the coefficients.

The actual fit is straightforward. We create Gaussian data first and then fit the model:

# data
X <- matrix(rnorm(n * p), n)
B <- matrix(rnorm(p * q), p)
Y <- X %*% B + matrix(rnorm(n * q, 0, 0.1), n)

fit <- edgenet(X=X, Y=Y, G.X=G.X, G.Y=G.Y, family=gaussian, maxit=10)
summary(fit)

## 'gaussian.edgenet' object
## 
## call:
## edgenet(X = X, Y = Y, G.X = G.X, G.Y = G.Y, maxit = 10, family = gaussian)
## 
## parameters:
## lambda  psigx  psigy 
##      1      1      1 
## 
## family:  gaussian
## link:  identity 
## 
## -> call coef(fit) for coefficients

The fit object contains information about coefficients, intercepts etc.

coef(fit)[,1:5]

##                  y[1]      y[2]      y[3]      y[4]      y[5]
## (Intercept) 0.9002230 0.9005039 0.9003168 0.9004461 0.9002941
## x[1]        0.9001085 0.9001319 0.8999682 0.9000759 0.9001001
## x[2]        0.8999267 0.9000134 0.8997293 0.8999051 0.8999385
## x[3]        0.9000749 0.9001032 0.8999503 0.9000534 0.9000960
## x[4]        0.9001007 0.9000886 0.8996047 0.9000757 0.9000938
## x[5]        0.9000545 0.9001142 0.8999050 0.9000709 0.9000744
## x[6]        0.9001465 0.9001626 0.9000795 0.9001029 0.9002031
## x[7]        0.9001109 0.9001762 0.9000571 0.9001200 0.9001300
## x[8]        0.9000897 0.9001146 0.9000235 0.9001106 0.9001170
## x[9]        0.9000339 0.9000850 0.8998514 0.9000128 0.9000368
## x[10]       0.9000951 0.9001490 0.9000207 0.9001081 0.9001137

Having the coefficients estimated we are able to predict novel data-sets:

pred  <- predict(fit, X)
pred[1:5, 1:5]

##            [,1]       [,2]       [,3]       [,4]       [,5]
## [1,]  1.0754258  1.0758017  1.0755811  1.0756590  1.0754980
## [2,]  1.3414276  1.3416002  1.3409907  1.3415776  1.3414993
## [3,]  6.0946679  6.0948892  6.0928457  6.0946259  6.0947814
## [4,]  1.0827088  1.0831625  1.0826976  1.0829180  1.0827223
## [5,] -0.4983141 -0.4978667 -0.4969878 -0.4980346 -0.4982316

The binomial case is the same only with a different family argument:

# data
X <- matrix(rnorm(n * p), n)
B <- matrix(rnorm(p * q), p)

eta <- 1 / (1 + exp(-X %*% B))
Y.binom <- do.call("cbind", lapply(seq(10), function(.) rbinom(n, 1, eta[,.])))

fit <- edgenet(X=X, Y=Y, G.X=G.X, G.Y=G.Y, family=binomial, maxit=10)
summary(fit)

## 'binomial.edgenet' object
## 
## call:
## edgenet(X = X, Y = Y, G.X = G.X, G.Y = G.Y, maxit = 10, family = binomial)
## 
## parameters:
## lambda  psigx  psigy 
##      1      1      1 
## 
## family:  binomial
## link:  logit 
## 
## -> call coef(fit) for coefficients

The Poisson case of course works analogously:

# data
X <- matrix(rnorm(n * p), n)
B <- matrix(rnorm(p * q), p)

eta <- exp(-X %*% B)
Y.pois <- do.call("cbind", lapply(seq(10), function(.) rpois(n, eta[,.])))

fit <- edgenet(X=X, Y=Y.pois, G.X=G.X, G.Y=G.Y, family=poisson, maxit=10)
summary(fit)

## 'poisson.edgenet' object
## 
## call:
## edgenet(X = X, Y = Y.pois, G.X = G.X, G.Y = G.Y, maxit = 10, 
##     family = poisson)
## 
## parameters:
## lambda  psigx  psigy 
##      1      1      1 
## 
## family:  poisson
## link:  log 
## 
## -> call coef(fit) for coefficients

Model selection

In most cases we do not have the optimal shrinkage parameters \(\lambda\), \(\psi_{1}\) and \(\psi_{2}\). For these settings you can use netReg’s included model selection. We use Powell’s BOBYQA algorithm (Powell 2009) for gradient-free optimization in a coss-validation framework. Doing the model selection only requires calling cv.edgenet:

cv <- cv.edgenet(X=X, Y=Y, G.X=G.Y, G.Y, family=gaussian, optim.maxit=10, maxit=10)
summary(cv)

## 'gaussian.cv.edgenet' object
## 
## call:
## cv.edgenet(X = X, Y = Y, G.X = G.Y, G.Y = G.Y, maxit = 10, family = gaussian, 
##     optim.maxit = 10)
## 
## optimal parameters:
## lambda (estimated)  psigx (estimated)  psigy (estimated) 
##          0.1474264          0.1697474          0.1000000 
## 
## family:  gaussian
## link:  identity 
## 
## -> call coef(cv) for coefficients

The cross-validation and BOBYQA is quite computationally intensive, so we set optim.maxit to \(10\). I recommend using TensorFlow on a GPU for that, since edgenet needs to compute many matrix multiplications.

The cv.edgenet object also contains a fit with the optimal parameters.

summary(cv$fit)

## 'gaussian.edgenet' object
## 
## call:
## edgenet(X = X, Y = Y, G.X = G.X, G.Y = G.Y, lambda = ret$lambda, 
##     psigx = ret$psigx, psigy = ret$psigy, thresh = thresh, maxit = maxit, 
##     learning.rate = learning.rate, family = family)
## 
## parameters:
##    lambda     psigx     psigy 
## 0.1474264 0.1697474 0.1000000 
## 
## family:  gaussian
## link:  identity 
## 
## -> call coef(cv$fit) for coefficients

You can obtain the coefficients like this:

coef(cv)[,1:5]

##                  y[1]      y[2]      y[3]      y[4]      y[5]
## (Intercept) 0.9001945 0.9006118 0.9003052 0.9005013 0.9002763
## x[1]        0.9001585 0.9001997 0.9001080 0.9001906 0.9001827
## x[2]        0.9002181 0.9002358 0.9001611 0.9002684 0.9001958
## x[3]        0.9001448 0.9000999 0.9003048 0.9001292 0.9001681
## x[4]        0.9002277 0.9001873 0.9002112 0.9002714 0.9001873
## x[5]        0.9001690 0.9001964 0.9002044 0.9001927 0.9002346
## x[6]        0.9001558 0.9001593 0.9001416 0.9001563 0.9001610
## x[7]        0.9002012 0.9001892 0.9001433 0.9001558 0.9001992
## x[8]        0.9001633 0.9002054 0.9001842 0.9002528 0.9001706
## x[9]        0.9002589 0.9002165 0.9001809 0.9001721 0.9002059
## x[10]       0.9001919 0.9002664 0.9002001 0.9002475 0.9001894

Furthermore, you can also directly predict using a cv.edgenet object:

pred  <- predict(cv, X)
pred[1:5, 1:5]

##            [,1]      [,2]       [,3]       [,4]       [,5]
## [1,]  5.9169331  5.917364  5.9172899  5.9173329  5.9171036
## [2,]  4.8573529  4.857692  4.8576471  4.8578522  4.8574471
## [3,] -3.6402032 -3.639846 -3.6401756 -3.6401373 -3.6401391
## [4,] -0.3251633 -0.324793 -0.3248117 -0.3249424 -0.3249141
## [5,] -1.8490140 -1.848673 -1.8488010 -1.8487738 -1.8488228

A biological example

This section explains how to fit a linear model and do parameter estimation.

At first we load the library and some data:

library(netReg)
data("yeast")

ls(yeast)

X <- yeast$X
Y <- yeast$Y
G.Y <- yeast$GY

The yeast data \(\mathbf{X}\) and \(\mathbf{Y}\) set is taken and adopted from (Brem et al. 2005), (Storey, Akey, and Kruglyak 2005), and (Cheng et al. 2014). \(\mathbf{GY}\) is taken from BioGRID.

\(\mathbf{X}\) is a \((n \times p)\) matrix of genetic markers where \(n\) is the number of samples (112) and \(p\) is the number of markers. \(\mathbf{Y}\) is a \((n \times q)\) matrix of expression values for \(q\) yeast genes. \(n\) is again the numer of samples (112). \(\mathbf{GY}\) is a \((q \times q)\) adjacency matrix representing protein-protein interactions for the \(q\) response variables.

We only use a smaller network in order to be able to print the results here.

fit <- edgenet(X=X, Y=Y, G.Y=G.Y, lambda=5, family=gaussian, maxit=10, thresh=1e-3)
summary(fit)

## 'gaussian.edgenet' object
## 
## call:
## edgenet(X = X, Y = Y, G.Y = G.Y, lambda = 5, thresh = 0.001, 
##     maxit = 10, family = gaussian)
## 
## parameters:
## lambda  psigx  psigy 
##      5      0      1 
## 
## family:  gaussian
## link:  identity 
## 
## -> call coef(fit) for coefficients

For the response variables we use an affinity matrix that represents biological relationships with \(\mathbf{GY}\). We promote sparsity by setting \(\lambda = 5\) and put a weak (default) penalty on similar coefficients by setting \(\psi_{gy} = 1\). Other than that we used standard parameters in this case.

The fit object contains information about coefficients and intercepts. Having the coefficients estimated we are able to predict novel data-sets:

X.new <- matrix(rnorm(10 * ncol(X)), 10)
pred  <- predict(fit, X.new)
pred[1:10, 1:5]

##             [,1]       [,2]       [,3]       [,4]       [,5]
##  [1,]  4.2008971  4.2013672  4.2009115  4.2011951  4.2009904
##  [2,] -0.2441918 -0.2437913 -0.2440459 -0.2438810 -0.2441129
##  [3,] -0.9478844 -0.9474961 -0.9477214 -0.9475729 -0.9478075
##  [4,]  0.6630633  0.6634801  0.6631809  0.6633710  0.6631440
##  [5,]  2.7012859  2.7017320  2.7013431  2.7015877  2.7013746
##  [6,] -3.1042542 -3.1038992 -3.1040256 -3.1039374 -3.1041863
##  [7,] -0.7581015 -0.7577092 -0.7579414 -0.7577913 -0.7580262
##  [8,]  8.5471919  8.5477266  8.5470778  8.5474799  8.5473005
##  [9,] -5.8865323 -5.8862183 -5.8862246 -5.8862092 -5.8864744
## [10,]  1.3180825  1.3185045  1.3181819  1.3183872  1.3181655

References

Brem, Rachel B, John D Storey, Jacqueline Whittle, and Leonid Kruglyak. 2005. “Genetic Interactions Between Polymorphisms That Affect Gene Expression in Yeast.” Nature 436 (7051): 701.

Cheng, Wei, Xiang Zhang, Zhishan Guo, Yu Shi, and Wei Wang. 2014. “Graph-Regularized Dual Lasso for Robust eQTL Mapping.” Bioinformatics 30 (12): i139–i148.

Chung, Fan RK, and Fan Chung Graham. 1997. Spectral Graph Theory. 92. American Mathematical Soc.

Meier, Lukas, Sara Van De Geer, and Peter Bühlmann. 2008. “The Group Lasso for Logistic Regression.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70 (1): 53–71.

Powell, Michael JD. 2009. “The Bobyqa Algorithm for Bound Constrained Optimization Without Derivatives.” Cambridge NA Report NA2009/06, University of Cambridge, Cambridge.

Storey, John D, Joshua M Akey, and Leonid Kruglyak. 2005. “Multiple Locus Linkage Analysis of Genomewide Expression in Yeast.” PLoS Biology 3 (8): e267.

Yuan, Ming, and Yi Lin. 2006. “Model Selection and Estimation in Regression with Grouped Variables.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (1): 49–67.