filterProfileData {PhyloProfile} | R Documentation |
Create a filtered data needed for plotting or clustering phylogenetic profiles. NOTE: this function require some intermediate steps using the results from other functions. If you would like to get a full processed data from the raw input, please use the function fromInputToProfile() instead!
filterProfileData(DF, taxaCount, refTaxon = NULL, percentCO = c(0, 1), coorthoCOMax = 9999, var1CO = c(0, 1), var2CO = c(0, 1), var1Rel = "protein", var2Rel = "protein", groupByCat = FALSE, catDt = NULL, var1AggregateBy = "max", var2AggregateBy = "max")
DF |
a reduced dataframe contains info for all phylogenetic profiles in the selected taxonomy rank. |
taxaCount |
dataframe counting present taxa in each supertaxon |
refTaxon |
selected reference taxon. NOTE: This taxon will not be affected by the filtering. If you want to filter all, set refTaxon <- NULL. Default = NULL. |
percentCO |
min and max cutoffs for percentage of species present in a supertaxon. Default = c(0, 1). |
coorthoCOMax |
maximum number of co-orthologs allowed. Default = 9999. |
var1CO |
min and max cutoffs for var1. Default = c(0, 1). |
var2CO |
min anc max cutoffs for var2. Default = c(0, 1). |
var1Rel |
relation of var1 ("protein" for protein-protein or "species" for protein-species). Default = "protein". |
var2Rel |
relation of var2 ("protein" for protein-protein or "species" for protein-species). Default = "protein". |
groupByCat |
group genes by their categories (TRUE or FALSE). Default = FALSE. |
catDt |
dataframe contains gene categories (optional, NULL if groupByCat = FALSE or no info provided). Default = NULL. |
var1AggregateBy |
aggregate method for VAR1 (max, min, mean or median), applied for calculating var1 of supertaxa. Default = "max". |
var2AggregateBy |
aggregate method for VAR2 (max, min, mean or median), applied for calculating var2 of supertaxa. Default = "max". |
A filtered dataframe for generating profile plot including seed gene IDs (or orthologous group IDs), their ortholog IDs and the corresponding (super)taxa, (super)taxon IDs, number of co-orthologs in each (super)taxon, values for two additional variables var1, var2, supertaxon, and the categories of seed genes (or ortholog groups).
Vinh Tran tran@bio.uni-frankfurt.de
parseInfoProfile
and reduceProfile
for generating input dataframe, fullProcessedProfile
for a
demo full processed profile dataframe, fromInputToProfile
for
generating fully processed data from raw input.
# NOTE: this function require some intermediate steps using the results from # other functions. If you would like to get a full processed data from the # raw input, please use the function fromInputToProfile() instead! data("fullProcessedProfile", package="PhyloProfile") rankName <- "class" refTaxon <- "Mammalia" percentCutoff <- c(0.0, 1.0) coorthologCutoffMax <- 10 var1Cutoff <- c(0.75, 1.0) var2Cutoff <- c(0.5, 1.0) var1Relation <- "protein" var2Relation <- "species" groupByCat <- FALSE catDt <- NULL var1AggregateBy <- "max" var2AggregateBy <- "max" taxonIDs <- levels(as.factor(fullProcessedProfile$ncbiID)) sortedInputTaxa <- sortInputTaxa( taxonIDs, rankName, refTaxon, NULL ) taxaCount <- plyr::count(sortedInputTaxa, "supertaxon") filterProfileData( fullProcessedProfile, taxaCount, refTaxon, percentCutoff, coorthologCutoffMax, var1Cutoff, var2Cutoff, var1Relation, var2Relation, groupByCat, catDt, var1AggregateBy, var2AggregateBy )