get_seq_encode_pad {DeepPINCS}R Documentation

Vectorization of characters of strings

Description

A vectorization of characters of strings is necessary. Vectorized characters are padded or truncated.

Usage

get_seq_encode_pad(sequences, length_seq, ngram_max = 1, ngram_min = 1,
    lenc = NULL)

Arguments

sequences

SMILE strings or amino acid sequences

length_seq

length of input sequences

ngram_max

maximum size of an n-gram (default: 1)

ngram_min

minimum size of an n-gram (default: 1)

lenc

encoded labels for characters, LableEncoder object fitted by "CatEncoders::LabelEncoder.fit" (default: NULL)

Value

sequences_encode_pad

for each SMILES string, an encoded sequence which is padded or truncated

lenc

encoded labels for characters

num_token

total number of characters

Author(s)

Dongmin Jung

See Also

CatEncoders::LabelEncoder.fit, CatEncoders::transform, keras::pad_sequences, stringdist::qgrams, tokenizers::tokenize_ngrams

Examples

if (keras::is_keras_available() & reticulate::py_available()) {
    get_seq_encode_pad(example_cpi[1, 2], 10)
}

[Package DeepPINCS version 1.2.1 Index]