fit_model {ttgsea} | R Documentation |
From the result of GSEA, we can predict enrichment scores for unique tokens or words from text in names of gene sets by using deep learning. The function "text_token" is used for tokenizing text and the function "token_vector" is used for encoding. Then the encoded sequence is fed to the embedding layer of the model.
fit_model(gseaRes, text, score, model, ngram_min = 1, ngram_max = 2, num_tokens, length_seq, epochs, batch_size, use_generator = TRUE, ...)
gseaRes |
a table with GSEA result having rows for gene sets and columns for text and scores |
text |
column name for text data |
score |
column name for enrichment score |
model |
deep learning model, input dimension and length for the embedding layer must be same to the "num_token" and "length_seq", respectively |
ngram_min |
minimum size of an n-gram (default: 1) |
ngram_max |
maximum size of an n-gram (default: 2) |
num_tokens |
maximum number of tokens, it must be equal to the input dimension of "layer_embedding" in the "model" |
length_seq |
length of input sequences, it must be equal to the input length of "layer_embedding" in the "model" |
epochs |
number of epochs |
batch_size |
batch size |
use_generator |
if "use_generator" is TRUE, the function "sampling_generator" is used for "fit_generator". Otherwise, the "fit" is used without a generator. |
... |
additional parameters for the "fit" or "fit_generator" |
model |
trained model |
tokens |
information for tokens |
token_pred |
prediction for every token, each row has a token and its predicted score |
token_gsea |
list of the GSEA result only for the corresponding token |
num_tokens |
maximum number of tokens |
length_seq |
length of input sequences |
Dongmin Jung
keras::fit_generator, keras::layer_embedding, keras::pad_sequences, textstem::lemmatize_strings, text2vec::create_vocabulary, text2vec::prune_vocabulary
library(reticulate) if (keras::is_keras_available() & reticulate::py_available()) { library(fgsea) data(examplePathways) data(exampleRanks) names(examplePathways) <- gsub("_", " ", substr(names(examplePathways), 9, 1000)) set.seed(1) fgseaRes <- fgsea(examplePathways, exampleRanks) num_tokens <- 1000 length_seq <- 30 batch_size <- 32 embedding_dims <- 50 num_units <- 32 epochs <- 1 ttgseaRes <- fit_model(fgseaRes, "pathway", "NES", model = bi_gru(num_tokens, embedding_dims, length_seq, num_units), num_tokens = num_tokens, length_seq = length_seq, epochs = epochs, batch_size = batch_size, use_generator = FALSE) }