Contents

1 Introduction

CancerInSilico provides a streamlined interface for simulating cellular models and gene expression data. The main functions in this package are inSilicoCellModel and inSilicoGeneExpression.

2 Running a Cell Simulation

2.1 Run Simple Simulation

Every call to inSilicoCellModel must specify the initial number of cells, the run time of the simulation in hours, and the initial density of the cell population. We also set the output increment here to minimize verbosity and the seed to allow for reproducibility.

simple_mod <- suppressMessages(inSilicoCellModel(initialNum=30, runTime=72,
    density=0.1, outputIncrement=24, randSeed=123))
## 
## time = 0.00
## size = 30
## time = 24.00
## size = 58
## time = 48.00
## size = 107
## time = 72.00
## size = 171

2.2 Plot CellModel Object

This creates a model object that can be used to generate gene expression data. It is also possible to view the model results directly throught the plotCells function. The plots are colored according to cell phase - cells in interphase are colored gray and cells in mitosis are colored black.

plotCells(simple_mod, time=0)

plotCells(simple_mod, time=36)

plotCells(simple_mod, time=72)

2.3 Query Cell Information

inSilicoCellModel outputs a CellModel that comes with getter functions to query information about the model. Here we use getNumberOfCells and getDensity to plot the size and density of the population over time.

# hours in simulation
times <- 0:simple_mod@runTime

# plot number of cells over time
nCells <- sapply(times, getNumberOfCells, model=simple_mod)
plot(times, nCells, type="l", xlab="hour", ylab="number of cells")

# plot population density over time
den <- sapply(times, getDensity, model=simple_mod)
plot(times, den, type="l", xlab="hour", ylab="population density")

3 Drugs

inSilicoCellModel supports drugs that function by supressing proliferation. A list of Drug objects can be passed to this function. These objects define a function to calculate the effect of the drug and the time at which the drug is added. The cycleLengthEffect field of the Drug object is a function that takes two parameters, cell type and cell cycle length. It returns the new cell cycle length.

Here we create a drug that cuts proliferation rates in half by doubling the cell cycle length. It is added at 24 hours into the simulation.

drug <- new("Drug", name="Drug_A", timeAdded=24,
    cycleLengthEffect=function(type, length) length * 2)
drug_mod <- suppressMessages(inSilicoCellModel(initialNum=30, runTime=72,
    density=0.1, drugs=c(drug), outputIncrement=24, randSeed=123))
## 
## time = 0.00
## size = 30
## time = 24.00
## size = 58
## time = 48.00
## size = 79
## time = 72.00
## size = 110
# hours in simulation
times <- 0:simple_mod@runTime

# plot number of cells over time
nCells <- sapply(times, getNumberOfCells, model=simple_mod)
nCells_drug <- sapply(times, getNumberOfCells, model=drug_mod)
plot(times, nCells, type="l", xlab="hour", ylab="number of cells")
lines(times, nCells_drug, type="l", xlab="hour", ylab="number of cells",
    col="red")

4 Cell Types

4.1 Adding a Single Cell Type

The mean cycle length of the cells in inSilicoCellModel is set to 24 hours by default. To change this we need to pass in a CellType object to the function. The CellType class allows for more fine grained control of the cellular properties in the simulation. The cycleLength field of the class is a function which takes no arguments and returns the target cycle length of a cell of this type. Note that the model also requires a minimum possible cycle length in the field minCycle.

type_A <- new("CellType", name="A", minCycle=16, cycleLength=function() 16)
fast_cells_mod <- suppressMessages(inSilicoCellModel(initialNum=30, runTime=72,
    density=0.1, cellTypes=c(type_A), outputIncrement=24, randSeed=123))
## 
## time = 0.00
## size = 30
## time = 24.00
## size = 73
## time = 48.00
## size = 163
## time = 72.00
## size = 209
# hours in simulation
times <- 0:fast_cells_mod@runTime

# plot number of cells over time
nCells <- sapply(times, getNumberOfCells, model=simple_mod)
nCells_fast <- sapply(times, getNumberOfCells, model=fast_cells_mod)
plot(times, nCells, type="l", xlab="hour", ylab="number of cells")
lines(times, nCells_fast, type="l", xlab="hour", ylab="number of cells",
    col="red")

4.2 Adding Multiple Cell Types

inSilicoCellModel also allows the user to pass a list of cell types. In this case we must also provide the cellTypeInitFreq argument which specifies the initial proportions of each cell type when the population is seeded. Note that we use a random cycleLength function to provide more variance within the cell types.

type_B <- new("CellType", name="B", size=1, minCycle=16,
    cycleLength=function() 16 + rexp(1,1/4))
type_C <- new("CellType", name="C", size=1, minCycle=32,
    cycleLength=function() 32 + rexp(1,1/4))
two_types_mod <- suppressMessages(inSilicoCellModel(initialNum=30, runTime=72,
    density=0.1, cellTypes=c(type_B, type_C), cellTypeInitFreq=c(0.4,0.6),
    outputIncrement=24, randSeed=123))
## 
## time = 0.00
## size = 30
## time = 24.00
## size = 52
## time = 48.00
## size = 94
## time = 72.00
## size = 158

4.3 Getting Cell Type

When inSilicoCellModel is run with one or more cell types, we can check which type each cell is with the function getCellType. Notice that the initial proportion of type B matches the parameter cellTypeInitFreq and from there grows larger since it is the faster growing cell type.

getTypeBProportion <- function(time)
{
    N <- getNumberOfCells(two_types_mod, time)
    sum(sapply(1:N, function(i) getCellType(two_types_mod, time, i) == 1)) / N
}
times <- 0:two_types_mod@runTime
Bprop <- sapply(times, getTypeBProportion)
plot(times, Bprop, type="l", xlab="hour", ylab="type B proportion")

5 Pathways

Gene pathways provide the link between the cell model and the gene expression simulation. A CancerInSilico pathway must define a function detailing how the state of the cell model effects the activity of that pathway. For example, a pathway related to cellular division would be more active in cells under going mitosis and less active in cells in interphase. To capture this we define a function mitosisExpression which takes the CellModel object, the cell ID, and the current time (all pathway expression functions take these arguments), and returns 1 if the cell is currently in mitosis and 0 otherwise.

mitosisGeneNames <- paste("m_", letters[1:20], sep="")
mitosisExpression <- function(model, cell, time)
{
    ifelse(getCellPhase(model, time, cell) == "M", 1, 0)
}

pwyMitosis <- new("Pathway", genes=mitosisGeneNames,
    expressionScale=mitosisExpression)

We define a second pathway for contact inhibition and define it’s activity in terms of the local density around an individual cell.

contactInhibitionGeneNames <- paste("ci_", letters[1:15], sep="")
contactInhibitionExpression <- function(model, cell, time)
{
    getLocalDensity(model, time, cell, 3.3)
}
pwyContactInhibition <- new("Pathway", genes=contactInhibitionGeneNames,
    expressionScale=contactInhibitionExpression)

5.1 Calibrate Gene Expression Range

Pathways define how active genes are, but we need a reference point to generate actual expression values. The calibratePathway function takes a pathway and a reference data set as it’s arguments and returns a calibrated pathway. Calling inSilicoGeneExpression on a non-calibrated pathway will throw an error. The reference data set contains genes along the rows and samples along the columns. All the genes in the pathway must be found in the rownames of the data set. Here we use a simulated data set.

# create simulated data set
allGenes <- c(mitosisGeneNames, contactInhibitionGeneNames)
geneMeans <- 2 + rexp(length(allGenes), 1/20)
data <- t(pmax(sapply(geneMeans, rnorm, n=25, sd=2), 0))
rownames(data) <- allGenes

# calibrate pathways
pwyMitosis <- calibratePathway(pwyMitosis, data)
pwyContactInhibition <- calibratePathway(pwyContactInhibition, data)

5.2 Generate Pathway Activity

inSilicoGeneExpression returns both the simulated gene expression data and the raw pathway activity that the gene expression is based on. Right now we are only interested in the pathway activity. We specify that 30 cells should be sampled every 6 hours to determine the pathway activity.

params <- new("GeneExpressionParams")
params@randSeed <- 123 # control this for reporducibility
params@nCells <- 30 # sample 30 cells at each time point to measure activity
params@sampleFreq <- 6 # measure activity every 6 hours

pwys <- c(pwyMitosis, pwyContactInhibition)
pwyActivity <- inSilicoGeneExpression(simple_mod, pwys, params)$pathways

5.3 Visualize Pathway Activity

We can plot the activity of each pathway to see exactly how our expressionScale functions are working with the model. Note how mitosis activity increases near the end of the model - this is exactly the opposite effect we would expect as contact inhibition gets stronger. The next section explores this issue.

# mitosis
plot(seq(0,72,6), pwyActivity[[1]], type="l", col="orange", ylim=c(0,1))
# contact inhibition
lines(seq(0,72,6), pwyActivity[[2]], col="blue")

5.4 Accounting for Model Effects

In the previous section we saw the mitosis genes grow more active when they should be repressed. This is due to the way we defined the expressionScale function for pwyMitosis. In the model, cells get stuck in mitosis trying to divide but unable to due to the density of the population. This results in a build up of cells in the mitosis phase that are unable to progress. Therefore, a more accurate measure of mitosis would be when the cell sucessfully completes a division. We acheive this by checking if the axis of the cell shrinks, a property that is specific to off-lattice cell models. Now we can see a gradual decline in mitosis activity over time.

pwyMitosis@expressionScale = function(model, cell, time)
{
    window <- c(max(time - 2, 0), min(time + 2, model@runTime))
    a1 <- getAxisLength(model, window[1], cell)
    a2 <- getAxisLength(model, window[2], cell)
    if (is.na(a1)) a1 <- 0 # in case cell was just born
    return(ifelse(a2 < a1, 1, 0))
}
pwys <- c(pwyMitosis, pwyContactInhibition)
pwyActivity <- inSilicoGeneExpression(simple_mod, pwys, params)$pathways
# mitosis
plot(seq(0,72,6), pwyActivity[[1]], type="l", col="orange", ylim=c(0,1))
# contact inhibition
lines(seq(0,72,6), pwyActivity[[2]], col="blue")

5.5 Normalize Pathway Activity

Notice that the maximum activity for the mitosis pathway is around 0.2 instead of 1. Again, this is due to the way we defined mitosis activity. There will never be close to 100% of the cells dividing so the activity is capped at a lower value. This isn’t neccesarily an error, but if the user wishes to normalize the pathway activity to [0,1], they can use the logistic transformation parameters provided in the Pathway class. If the slope and midpoint are set, then the function f(x) = 1 / (1 + exp(-slope(x - midpoint))) is applied to the raw pathway activity - allowing for a smooth normalization.

pwyMitosis@transformMidpoint = 0.1  
pwyMitosis@transformSlope = 5 / 0.1
pwys <- c(pwyMitosis, pwyContactInhibition)
pwyActivity <- inSilicoGeneExpression(simple_mod, pwys, params)$pathways
# mitosis
plot(seq(0,72,6), pwyActivity[[1]], type="l", col="orange", ylim=c(0,1))
# contact inhibition
lines(seq(0,72,6), pwyActivity[[2]], col="blue")

6 Simulating Bulk Gene Expression Data

6.1 Simulating Microarray Data

Now that we have a CellModel and a few Pathways, we can simulate gene expression data. We need to specify additional parameters to tell inSilicoGeneExpression what kind of data to generate. In this case we are generating bulk microarray data.

params@RNAseq <- FALSE # generate microarray data
params@singleCell <- FALSE # generate bulk data
params@perError <- 0.1 # parameter for simulated noise

pwys <- c(pwyMitosis, pwyContactInhibition)
ge <- inSilicoGeneExpression(simple_mod, pwys, params)$expression

6.2 Visualize Bulk Gene Expression Data

Using the heatmap.2 function we can visualize our simulated gene expression data. Here we color the rows with mitosis genes as orange and the contact inhibition genes as blue. Notice how mitosis genes are active in a cyclical pattern, and contact inhibition genes grow more active over time, as the population gets more dense.

ndx <- apply(ge, 1, var) == 0 # remove zero variance rows
heatmap.2(ge[!ndx,], 
    col=greenred, scale="row",
    trace="none", hclust=function(x) hclust(x,method="complete"),
    distfun=function(x) as.dist((1-cor(t(x)))/2), 
    Colv=FALSE, dendrogram="row",
    RowSideColors = ifelse(rownames(ge[!ndx,]) %in%
        mitosisGeneNames, "orange", "blue"),
    labRow = FALSE, labCol = seq(0,72,6),
    main="Bulk Gene Expression from Simple Cell Simulation")

7 Simulating Single Cell Gene Expression Data

7.1 Cell Type Pathways

In order to simulate gene expression data related to the cell types, we need to define pathways with activity based on the type of the cell.

# gene names
B_genes <- paste("b.", letters[1:20], sep="")
C_genes <- paste("c.", letters[1:20], sep="")

# pathway behavior
pwy_B <- new("Pathway", genes=B_genes, expressionScale=
    function(model, cell, time) ifelse(getCellType(model, time, cell)==1, 1, 0))
pwy_C <- new("Pathway", genes=C_genes, expressionScale=
    function(model, cell, time) ifelse(getCellType(model, time, cell)==2, 1, 0))

# calibrate pathways
geneMeans <- 2 + rexp(length(c(B_genes, C_genes)), 1/20)
data <- t(pmax(sapply(geneMeans, rnorm, n=25, sd=2), 0))
rownames(data) <- c(B_genes, C_genes)
pwy_B <- calibratePathway(pwy_B, data)
pwy_C <- calibratePathway(pwy_C, data)

7.2 Simulating Single Cell RNA-seq

Now that we have our pathways, we call inSilicoGeneExpression in the same way as before, the only difference is the GeneExpressionParams object. In this case we set RNAseq and singleCell to be true, and we also must pass parameters specific to single cell data.

params@RNAseq <- TRUE
params@singleCell <- TRUE
params@dropoutPresent <- TRUE
ge <- inSilicoGeneExpression(two_types_mod, c(pwy_B, pwy_C), params)$expression

7.3 Visualize Single Cell Data

Here we run PCA on the gene expression data and color each point by cell type.

cells <- unname(sapply(colnames(ge), function(x) strsplit(x,"_")[[1]][1]))
cells <- as.numeric(gsub("c", "", cells))
type <- sapply(cells, getCellType, model=two_types_mod,
    time=two_types_mod@runTime)
type[type==1] <- "red"
type[type==2] <- "blue"

pca <- prcomp(ge, center=FALSE, scale.=FALSE)
plot(pca$rotation[,c(1,2)], col=type)

8 References