Contents

1 Introduction

miaSim implements tools for microbiome data simulation based on the SummarizedExperiment [@SE], microsim .

Microbiome time series simulation can be obtained by generalized Lotka-Volterra model,simulateGLV, and Self-Organized Instability (SOI), simulateSOI. Hubbell’s Neutral model, simulateHubbell is used to determine the species abundance matrix. The resulting abundance matrix from these three simulation models is applied to SummarizedExperiment object or TreeSummarizedExperiment object.

powerlawA and randomA give interaction matrix of species generated by normal distribution and uniform distribution, respectively. These matrices can be used in the simulation model examples.

tDyn generates lists of time series that can be specified as simulation time and time points to keep in simulated time.

2 Installation

2.0.1 Bioc-release

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

3 Models for simulating microbiome data sets

simulateGLV is the generalized Lotka-Volterra simulation model fitted to time-series estimates microbial population dynamics and relative rates of interaction. The model relies on interaction matrix that represents interaction heterogeneity between species. This interaction matrix can be generated with powerlawA or randomA functions depending on the distribution method.

powerlawA uses normal distribution to create interaction matrix.

library(miaSim)
A_normal <- powerlawA(n.species = 4, alpha = 3)

randomA uses uniform distribution to create interaction matrix.

A_uniform <- randomA(n.species = 10, d = -0.4, min.strength = -0.8,
                        max.strength = 0.8, connectance = 0.5)

The number of species specified in the interaction matrix must be the same amount as the species used in the simulateGLV and simulateSOI models.

SEobject <- simulateGLV(n.species = 4, A_normal, t.end = 1000)

Time series is added to simulateGLV with tDyn function where the time points can be kept and extracted from simulation time as a separate list.

Time <- tDyn(t.start = 0, t.end = 100, t.step = 0.5, t.store = 100)
Time$t.index 
##   [1]   1   3   5   7   9  11  13  15  17  19  21  23  25  27  29  31  33  35
##  [19]  37  39  41  43  45  47  49  51  53  55  57  59  61  63  65  67  69  71
##  [37]  73  75  77  79  81  83  85  87  89  91  93  95  97  99 101 103 105 107
##  [55] 109 111 113 115 117 119 121 123 125 127 129 131 133 135 137 139 141 143
##  [73] 145 147 149 151 153 155 157 159 161 163 165 167 169 171 173 175 177 179
##  [91] 181 183 185 187 189 191 193 195 197 199 201

simulateHubbell includes the Hubbell Neutral simulation model which explains the diversity and relative abundance of species in ecological communities. This model is based on the community dynamics; migration, births and deaths.

A_uniform <- randomA(n.species = 10, d = -0.4, min.strength = -0.8,
                        max.strength = 0.8, connectance = 0.5)

The number of species specified in the interaction matrix must be the same amount as the species used in the simulateGLV and simulateSOI models.

SEobject <- simulateGLV(n.species = 4, A_normal, t.start = 0, t.store = 1000)

Time series is added to simulateGLV with tDyn function where the time points can be kept and extracted from simulation time as a separate list.

Time <- tDyn(t.start = 0, t.end = 100, t.step = 0.5, t.store = 100)
Time$t.index 
##   [1]   1   3   5   7   9  11  13  15  17  19  21  23  25  27  29  31  33  35
##  [19]  37  39  41  43  45  47  49  51  53  55  57  59  61  63  65  67  69  71
##  [37]  73  75  77  79  81  83  85  87  89  91  93  95  97  99 101 103 105 107
##  [55] 109 111 113 115 117 119 121 123 125 127 129 131 133 135 137 139 141 143
##  [73] 145 147 149 151 153 155 157 159 161 163 165 167 169 171 173 175 177 179
##  [91] 181 183 185 187 189 191 193 195 197 199 201

simulateHubbell includes the Hubbell Neutral simulation model which explains the diversity and relative abundance of species in ecological communities. This model is based on the community dynamics; migration, births and deaths.

ExampleHubbell <- simulateHubbell(n.species = 8, M = 10, I = 1000, d = 50,
                                    m = 0.02, tend = 100)

The Self-Organised Instability (SOI) model can be found in simulateSOI and it generates time series for communities and accelerates stochastic simulation.

ExampleSOI <- simulateSOI(n.species = 4, I = 1000, A_normal, k=5, com = NULL,
                                            tend = 150, norm = TRUE)

The simulations result in the SummarizedExperiment [@SE] class object containing the abundance matrix. Other fields, such as rowData containing information about the samples, and colData, consisting of sample metadata describing the samples, can be added to the SummarizedExperiment class object.

4 Session info

sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] miaSim_1.0.0                SummarizedExperiment_1.24.0
##  [3] Biobase_2.54.0              GenomicRanges_1.46.0       
##  [5] GenomeInfoDb_1.30.0         IRanges_2.28.0             
##  [7] S4Vectors_0.32.0            BiocGenerics_0.40.0        
##  [9] MatrixGenerics_1.6.0        matrixStats_0.61.0         
## [11] BiocStyle_2.22.0           
## 
## loaded via a namespace (and not attached):
##  [1] pracma_2.3.3           bslib_0.3.1            compiler_4.1.1        
##  [4] BiocManager_1.30.16    jquerylib_0.1.4        XVector_0.34.0        
##  [7] bitops_1.0-7           tools_4.1.1            zlibbioc_1.40.0       
## [10] digest_0.6.28          jsonlite_1.7.2         evaluate_0.14         
## [13] lattice_0.20-45        rlang_0.4.12           Matrix_1.3-4          
## [16] DelayedArray_0.20.0    parallel_4.1.1         yaml_2.2.1            
## [19] xfun_0.27              fastmap_1.1.0          GenomeInfoDbData_1.2.7
## [22] stringr_1.4.0          knitr_1.36             sass_0.4.0            
## [25] grid_4.1.1             deSolve_1.30           R6_2.5.1              
## [28] poweRlaw_0.70.6        rmarkdown_2.11         bookdown_0.24         
## [31] magrittr_2.0.1         htmltools_0.5.2        stringi_1.7.5         
## [34] RCurl_1.98-1.5