run_mdt {decoupleR}R Documentation

Multivariate Decision Trees (MDT)

Description

Calculates regulatory activities by fitting multivariate decision trees (MDT) using ranger::ranger().

Usage

run_mdt(
  mat,
  network,
  .source = .data$source,
  .target = .data$target,
  .mor = .data$mor,
  .likelihood = .data$likelihood,
  sparse = FALSE,
  center = FALSE,
  na.rm = FALSE,
  trees = 10,
  min_n = 20,
  nproc = 4,
  seed = 42
)

Arguments

mat

Matrix to evaluate (e.g. expression matrix). Target nodes in rows and conditions in columns. rownames(mat) must have at least one intersection with the elements in network .target column.

network

Tibble or dataframe with edges and it's associated metadata.

.source

Column with source nodes.

.target

Column with target nodes.

.mor

Column with edge mode of regulation (i.e. mor).

.likelihood

Column with edge likelihood.

sparse

Logical value indicating if the generated profile matrix should be sparse.

center

Logical value indicating if mat must be centered by base::rowMeans().

na.rm

Should missing values (including NaN) be omitted from the calculations of base::rowMeans()?

trees

An integer for the number of trees contained in the ensemble.

min_n

An integer for the minimum number of data points in a node that are required for the node to be split further.

nproc

Number of cores to use for computation.

seed

A single value, interpreted as an integer, or NULL for random number generation.

Details

MDT fits a multivariate ensemble of decision trees (random forest) to estimate regulatory activities. MDT transforms a given network into an adjacency matrix, placing sources as columns and targets as rows. The matrix is filled with the associated weights for each interaction. This matrix is used to fit a random forest model to predict the observed molecular readouts per sample. The obtained feature importances from the fitted model are the activities of the regulators.

Value

A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:

  1. statistic: Indicates which method is associated with which score.

  2. source: Source nodes of network.

  3. condition: Condition representing each column of mat.

  4. score: Regulatory activity (enrichment score).

See Also

Other decoupleR statistics: decouple(), run_aucell(), run_fgsea(), run_gsva(), run_mlm(), run_ora(), run_udt(), run_ulm(), run_viper(), run_wmean(), run_wsum()

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "input-expr_matrix.rds"))
network <- readRDS(file.path(inputs_dir, "input-dorothea_genesets.rds"))

run_mdt(mat, network, .source='tf')

[Package decoupleR version 1.99.5 Index]