Contents

1 Vignette Version

This vignette was built using CoGAPS version:

packageVersion("CoGAPS")
## [1] '3.12.0'

2 Introduction

Coordinated Gene Association in Pattern Sets (CoGAPS) is a technique for latent space learning in gene expression data. CoGAPS is a member of the Nonnegative Matrix Factorization (NMF) class of algorithms. NMFs factorize a data matrix into two related matrices containing gene weights, the Amplitude (A) matrix, and sample weights, the Pattern (P) Matrix. Each column of A or row of P defines a feature and together this set of features defines the latent space among genes and samples, respectively. In NMF, the values of the elements in the A and P matrices are constrained to be greater than or equal to zero. This constraint simultaneously reflects the non-negative nature of gene expression data and enforces the additive nature of the resulting feature dimensions, generating solutions that are biologically intuitive to interpret (Seung and Lee (1999)).

CoGAPS has two extensions that allow it to scale up to large data sets, Genome-Wide CoGAPS (GWCoGAPS) and Single-Cell CoGAPS (scCOGAPS). This package presents a unified R interface for all three methods, with a parallel, efficient underlying implementation in C++.

3 Installing CoGAPS

CoGAPS is a bioconductor package and so the release version can be installed as follows:

source("https://bioconductor.org/biocLite.R")
biocLite("CoGAPS")

The most up-to-date version of CoGAPS can be installed directly from the FertigLab Github Repository:

## Method 1 using biocLite
biocLite("FertigLab/CoGAPS", dependencies = TRUE, build_vignettes = TRUE)

## Method 2 using devtools package
devtools::install_github("FertigLab/CoGAPS")

There is also an option to install the development version of CoGAPS, while this version has the latest experimental features, it is not guaranteed to be stable.

## Method 1 using biocLite
biocLite("FertigLab/CoGAPS", ref="develop", dependencies = TRUE, build_vignettes = TRUE)

## Method 2 using devtools package
devtools::install_github("FertigLab/CoGAPS", ref="develop")

4 Package Overview

We first give a walkthrough of the package features using a simple, simulated data set. In later sections we provide two example workflows on real data sets.

4.1 Running CoGAPS with Default Parameters

The only required argument to CoGAPS is the data set. This can be a matrix, data.frame, SummarizedExperiment, SingleCellExperiment or the path of a file (tsv, csv, mtx, gct) containing the data.

# load data
data(GIST)

# run CoGAPS (low number of iterations since this is just an example)
CoGAPS(GIST.matrix, nIterations=1000)
## 
## This is CoGAPS version 3.12.0 
## Running Standard CoGAPS on GIST.matrix (1363 genes and 9 samples) with parameters:
## 
## -- Standard Parameters --
## nPatterns            7 
## nIterations          1000 
## seed                 783 
## sparseOptimization   FALSE 
## 
## -- Sparsity Parameters --
## alpha          0.01 
## maxGibbsMass   100
## [1] "CogapsResult object with 1363 features and 9 samples"
## [1] "7 patterns were learned"

While CoGAPS is running it periodically prints status messages. For example, 20000 of 25000, Atoms: 2932(80), ChiSq: 9728, time: 00:00:29 / 00:01:19. This message tells us that CoGAPS is at iteration 20000 out of 25000 for this phase, and that 29 seconds out of an estimated 1 minute 19 seconds have passed. It also tells us the size of the atomic domain which is a core component of the algorithm but can be ignored for now. Finally, the ChiSq value tells us how closely the A and P matrices reconstruct the original data. In general, we want this value to go down - but it is not a perfect measurment of how well CoGAPS is finding the biological processes contained in the data. CoGAPS also prints a message indicating which phase is currently happening. There are two phases to the algorithm - Equilibration and Sampling.

4.2 Setting Parameters

4.2.1 Model Parameters

Most of the time we’ll want to set some parameters before running CoGAPS. Parameters are managed with a CogapsParams object. This object will store all parameters needed to run CoGAPS and provides a simple interface for viewing and setting the parameter values.

# create new parameters object
params <- new("CogapsParams")

# view all parameters
params
## -- Standard Parameters --
## nPatterns            7 
## nIterations          50000 
## seed                 670 
## sparseOptimization   FALSE 
## 
## -- Sparsity Parameters --
## alpha          0.01 
## maxGibbsMass   100
# get the value for a specific parameter
getParam(params, "nPatterns")
## [1] 7
# set the value for a specific parameter
params <- setParam(params, "nPatterns", 3)
getParam(params, "nPatterns")
## [1] 3

Once we’ve created the parameters object we can pass it along with our data to CoGAPS.

# run CoGAPS with specified model parameters
CoGAPS(GIST.matrix, params, nIterations=1000)
## 
## This is CoGAPS version 3.12.0 
## Running Standard CoGAPS on GIST.matrix (1363 genes and 9 samples) with parameters:
## 
## -- Standard Parameters --
## nPatterns            3 
## nIterations          1000 
## seed                 670 
## sparseOptimization   FALSE 
## 
## -- Sparsity Parameters --
## alpha          0.01 
## maxGibbsMass   100
## [1] "CogapsResult object with 1363 features and 9 samples"
## [1] "3 patterns were learned"

4.2.2 Run Configuration Options

The CogapsParams class manages the model parameters - i.e. the parameters that affect the result. There are also a few parameters that are passed directly to CoGAPS that control things like displaying the status of the run.

# run CoGAPS with specified output frequency
CoGAPS(GIST.matrix, params, nIterations=1000, outputFrequency=250)
## 
## This is CoGAPS version 3.12.0 
## Running Standard CoGAPS on GIST.matrix (1363 genes and 9 samples) with parameters:
## 
## -- Standard Parameters --
## nPatterns            3 
## nIterations          1000 
## seed                 670 
## sparseOptimization   FALSE 
## 
## -- Sparsity Parameters --
## alpha          0.01 
## maxGibbsMass   100
## [1] "CogapsResult object with 1363 features and 9 samples"
## [1] "3 patterns were learned"

There are several other arguments that are passed directly to CoGAPS which are covered in later sections.

4.3 Breaking Down the Return Object from CoGAPS

CoGAPS returns a object of the class CogapsResult which inherits from LinearEmbeddingMatrix (defined in the SingleCellExperiment package). CoGAPS stores the lower dimensional representation of the samples (P matrix) in the sampleFactors slot and the weight of the features (A matrix) in the featureLoadings slot. CogapsResult also adds two of its own slots - factorStdDev and loadingStdDev which contain the standard deviation across sample points for each matrix.

There is also some information in the metadata slot such as the original parameters and value for the Chi-Sq statistic. In general, the metadata will vary depending on how CoGAPS was called in the first place. The package provides these functions for querying the metadata in a safe manner:

# run CoGAPS
result <- CoGAPS(GIST.matrix, params, messages=FALSE, nIterations=1000)
## 
## This is CoGAPS version 3.12.0 
## Running Standard CoGAPS on GIST.matrix (1363 genes and 9 samples)
# get the mean ChiSq statistic over all samples
getMeanChiSq(result)
## [1] 4521.769
# get the version number used to create this result
getVersion(result)
## [1] '3.12.0'
# get the original parameters used to create this result
getOriginalParameters(result)
## -- Standard Parameters --
## nPatterns            3 
## nIterations          1000 
## seed                 670 
## sparseOptimization   FALSE 
## 
## -- Sparsity Parameters --
## alpha          0.01 
## maxGibbsMass   100

To convert a CogapsResult object to a LinearEmbeddingMatrix use

as(result, "LinearEmbeddingMatrix")
## class: LinearEmbeddingMatrix 
## dim: 9 3 
## metadata(16): meanChiSq chisq ... params version
## rownames: NULL
## colnames: NULL
## factorData names(0):

4.4 Visualizing Output

The CogapsResult object can be passed on to the analysis and plotting functions provided in the package. By default, the plot function displays how the patterns vary across the samples. (Note that we pass the nIterations parameter here directly, this is allowed for any parameters in the CogapsParams class and will always take precedent over the values given in params).

# store result
result <- CoGAPS(GIST.matrix, params, nIterations=5000, messages=FALSE)
## 
## This is CoGAPS version 3.12.0 
## Running Standard CoGAPS on GIST.matrix (1363 genes and 9 samples)
# plot CogapsResult object returned from CoGAPS
plot(result)

In the example workflows we’ll explore some more analysis functions provided in the package.

4.5 Running CoGAPS in Parallel

Non-Negative Matrix Factorization algorithms typically require long computation times and CoGAPS is no exception. In order to scale CoGAPS up to the size of data sets seen in practice we need to take advantage of modern hardware and parallelize the algorithm.

4.5.1 Multi-Threaded Parallelization

The simplest way to run CoGAPS in parallel is to provide the nThreads argument to CoGAPS. This allows the underlying algorithm to run on multiple threads and has no effect on the mathematics of the algorithm i.e. this is still standard CoGAPS. The precise number of threads to use depends on many things like hardware and data size. The best approach is to play around with different values and see how it effects the estimated time.

CoGAPS(GIST.matrix, nIterations=10000, outputFrequency=5000, nThreads=1, seed=5)
## 
## This is CoGAPS version 3.12.0 
## Running Standard CoGAPS on GIST.matrix (1363 genes and 9 samples) with parameters:
## 
## -- Standard Parameters --
## nPatterns            7 
## nIterations          10000 
## seed                 5 
## sparseOptimization   FALSE 
## 
## -- Sparsity Parameters --
## alpha          0.01 
## maxGibbsMass   100
## [1] "CogapsResult object with 1363 features and 9 samples"
## [1] "7 patterns were learned"
CoGAPS(GIST.matrix, nIterations=10000, outputFrequency=5000, nThreads=4, seed=5)
## Warning in CoGAPS(GIST.matrix, nIterations = 10000, outputFrequency = 5000, :
## requesting multi-threaded version of CoGAPS but compiler did not support OpenMP
## 
## This is CoGAPS version 3.12.0 
## Running Standard CoGAPS on GIST.matrix (1363 genes and 9 samples) with parameters:
## 
## -- Standard Parameters --
## nPatterns            7 
## nIterations          10000 
## seed                 5 
## sparseOptimization   FALSE 
## 
## -- Sparsity Parameters --
## alpha          0.01 
## maxGibbsMass   100
## [1] "CogapsResult object with 1363 features and 9 samples"
## [1] "7 patterns were learned"

Note this method relies on CoGAPS being compiled with OpenMP support, use buildReport to check.

cat(CoGAPS::buildReport())
## Compiled with GCC v9.3
## SIMD: AVX instructions enabled
## Compiler did not support OpenMP

4.5.2 Distributed CoGAPS (GWCoGAPS/scCoGAPS)

For large datasets (greater than a few thousand genes or samples) the multi-threaded parallelization isn’t enough. It is more efficient to break up the data into subsets and perform CoGAPS on each subset in parallel, stitching the results back together at the end. The CoGAPS extensions, GWCOGAPS and scCoGAPS, each implement a version of this method (Stein-O’Brien et al. (2017)).

In order to use these extensions, some additional parameters are required. nSets specifies the number of subsets to break the data set into. cut, minNS, and maxNS control the process of matching patterns across subsets and in general should not be changed from defaults. More information about these parameters can be found in the original papers. These parameters need to be set with a different function than setParam since they depend on each other. Here we only set nSets (always required), but we have the option to pass the other parameters as well.

params <- setDistributedParams(params, nSets=3)
## setting distributed parameters - call this again if you change nPatterns

Setting nSets requires balancing available hardware and run time against the size of your data. In general, nSets should be less than or equal to the number of nodes/cores that are available. If that is true, then the more subsets you create, the faster CoGAPS will run - however, some robustness can be lost when the subsets get too small. The general rule of thumb is to set nSets so that each subset has between 1000 and 5000 genes or cells. We will see an example of this on real data in the next two sections.

Once the distributed parameters have been set we can call CoGAPS either by setting the distributed parameter or by using the provided wrapper functions. The following calls are equivalent:

# need to use a file with distributed cogaps
GISTCsvPath <- system.file("extdata/GIST.csv", package="CoGAPS")

# genome-wide CoGAPS
GWCoGAPS(GISTCsvPath, params, messages=FALSE, nIterations=1000)
## Warning in GWCoGAPS(GISTCsvPath, params, messages = FALSE, nIterations =
## 1000): GWCoGAPS is deprecated, use the main function CoGAPS with the argument:
## distributed="genome-wide"
## 
## This is CoGAPS version 3.12.0 
## Running genome-wide CoGAPS on /tmp/Rtmpg3vvGB/Rinst2e5403707bc119/CoGAPS/extdata/GIST.csv (1363 genes and 9 samples)
## [1] "CogapsResult object with 1363 features and 9 samples"
## [1] "3 patterns were learned"
# genome-wide CoGAPS
CoGAPS(GISTCsvPath, params, distributed="genome-wide", messages=FALSE, nIterations=1000)
## 
## This is CoGAPS version 3.12.0 
## Running genome-wide CoGAPS on /tmp/Rtmpg3vvGB/Rinst2e5403707bc119/CoGAPS/extdata/GIST.csv (1363 genes and 9 samples)
## [1] "CogapsResult object with 1363 features and 9 samples"
## [1] "3 patterns were learned"
# single-cell CoGAPS
scCoGAPS(GISTCsvPath, params, messages=FALSE, transposeData=TRUE, nIterations=1000)
## Warning in scCoGAPS(GISTCsvPath, params, messages = FALSE, transposeData =
## TRUE, : scCoGAPS is deprecated, use the main function CoGAPS with the argument:
## distributed="single-cell"
## 
## This is CoGAPS version 3.12.0 
## Running single-cell CoGAPS on /tmp/Rtmpg3vvGB/Rinst2e5403707bc119/CoGAPS/extdata/GIST.csv (9 genes and 1363 samples)
## [1] "CogapsResult object with 9 features and 1363 samples"
## [1] "3 patterns were learned"
# single-cell CoGAPS
CoGAPS(GISTCsvPath, params, distributed="single-cell", messages=FALSE, transposeData=TRUE, nIterations=1000)
## 
## This is CoGAPS version 3.12.0 
## Running single-cell CoGAPS on /tmp/Rtmpg3vvGB/Rinst2e5403707bc119/CoGAPS/extdata/GIST.csv (9 genes and 1363 samples)
## [1] "CogapsResult object with 9 features and 1363 samples"
## [1] "3 patterns were learned"

The parallel backend for this computation is managed by the package BiocParallel and there is an option for the user to specifiy which backend they want. See the Additional Features section for more information.

In general it is preferred to pass a file name to GWCoGAPS/scCoGAPS since otherwise the entire data set must be copied across multiple processes which will slow things down and potentially cause an out-of-memory error. We will see examples of this in the next two sections.

5 Additional Features of CoGAPS

5.1 Checkpoint System - Saving/Loading CoGAPS Runs

CoGAPS allows the user to save their progress throughout the run, and restart from the latest saved “checkpoint”. This is intended so that if the server crashes in the middle of a long run it doesn’t need to be restarted from the beginning. Set the checkpointInterval parameter to save checkpoints and pass a file name as checkpointInFile to load from a checkpoint.

if (CoGAPS::checkpointsEnabled())
{
    # our initial run
    res1 <- CoGAPS(GIST.matrix, params, checkpointInterval=100, checkpointOutFile="vignette_example.out", messages=FALSE)

    # assume the previous run crashes
    res2 <- CoGAPS(GIST.matrix, checkpointInFile="vignette_example.out", messages=FALSE)

    # check that they're equal
    all(res1@featureLoadings == res2@featureLoadings)
    all(res1@sampleFactors == res2@sampleFactors)
}

5.2 Transposing Data

If your data is stored as samples x genes, CoGAPS allows you to pass transposeData=TRUE and will automatically read the transpose of your data to get the required genes x samples configuration.

5.3 Passing Uncertainty Matrix

In addition to providing the data, the user can also specify an uncertainty measurement - the standard deviation of each entry in the data matrix. By default, CoGAPS assumes that the standard deviation matrix is 10% of the data matrix. This is a reasonable heuristic to use, but for specific types of data you may be able to provide better information.

# run CoGAPS with custom uncertainty
data(GIST)
result <- CoGAPS(GIST.matrix, params, uncertainty=GIST.uncertainty, messages=FALSE, nIterations=1000)
## 
## This is CoGAPS version 3.12.0 
## Running Standard CoGAPS on GIST.matrix (1363 genes and 9 samples)

5.4 GWCoGAPS/scCoGAPS

5.4.1 Setting Parallel Backend

The distributed computation for CoGAPS uses BiocParallel underneath the hood to manage the parallelization. The user has the option to specify what the backend should be. By default, it is MulticoreParam with the same number of workers as nSets. Use the BPPARAM parameter in CoGAPS to set the backend. See the vignette for BiocParallel for more information about the different choices for the backend.

# run CoGAPS with serial backend
scCoGAPS(GISTCsvPath, params, BPPARAM=BiocParallel::SerialParam(), messages=FALSE, transposeData=TRUE, nIterations=1000)
## Warning in scCoGAPS(GISTCsvPath, params, BPPARAM =
## BiocParallel::SerialParam(), : scCoGAPS is deprecated, use the main function
## CoGAPS with the argument: distributed="single-cell"
## 
## This is CoGAPS version 3.12.0 
## Running single-cell CoGAPS on /tmp/Rtmpg3vvGB/Rinst2e5403707bc119/CoGAPS/extdata/GIST.csv (9 genes and 1363 samples)
## [1] "CogapsResult object with 9 features and 1363 samples"
## [1] "3 patterns were learned"

5.4.2 Methods of Subsetting Data

The default method for subsetting the data is to uniformly break up the rows (cols) of the data. There is an alternative option where the user provides an annotation vector for the rownames (colnames) of the data and gives a weight to each category in the annotation vector. Equal sized subsets are then drawn by sampling all rows (cols) according to the weight of each category.

# sampling with weights
anno <- sample(letters[1:5], size=nrow(GIST.matrix), replace=TRUE)
w <- c(1,1,2,2,1)
names(w) <- letters[1:5]
params <- new("CogapsParams")
params <- setAnnotationWeights(params, annotation=anno, weights=w)
result <- GWCoGAPS(GISTCsvPath, params, messages=FALSE, nIterations=1000)
## Warning in GWCoGAPS(GISTCsvPath, params, messages = FALSE, nIterations =
## 1000): GWCoGAPS is deprecated, use the main function CoGAPS with the argument:
## distributed="genome-wide"
## 
## This is CoGAPS version 3.12.0 
## Running genome-wide CoGAPS on /tmp/Rtmpg3vvGB/Rinst2e5403707bc119/CoGAPS/extdata/GIST.csv (1363 genes and 9 samples)

Finally, the user can set explicitSets which is a list of character or numeric vectors indicating which names or indices of the data should be put into each set. Make sure to set nSets to the correct value before passing explicitSets.

# running cogaps with given subsets
sets <- list(1:225, 226:450, 451:675, 676:900)
params <- new("CogapsParams")
params <- setDistributedParams(params, nSets=length(sets))
## setting distributed parameters - call this again if you change nPatterns
result <- GWCoGAPS(GISTCsvPath, params, explicitSets=sets, messages=FALSE, nIterations=1000)
## Warning in GWCoGAPS(GISTCsvPath, params, explicitSets = sets, messages =
## FALSE, : GWCoGAPS is deprecated, use the main function CoGAPS with the argument:
## distributed="genome-wide"
## 
## This is CoGAPS version 3.12.0 
## Running genome-wide CoGAPS on /tmp/Rtmpg3vvGB/Rinst2e5403707bc119/CoGAPS/extdata/GIST.csv (1363 genes and 9 samples)

5.4.3 Additional Return Information

When running GWCoGAPS or scCoGAPS, some additional metadata is returned that relates to the pattern matching process. This process is how CoGAPS stitches the results from each subset back together.

# run GWCoGAPS (subset data so the displayed output is small)
params <- new("CogapsParams")
params <- setParam(params, "nPatterns", 3)
params <- setDistributedParams(params, nSets=2)
## setting distributed parameters - call this again if you change nPatterns
result <- GWCoGAPS(GISTCsvPath, params, messages=FALSE, nIterations=1000)
## Warning in GWCoGAPS(GISTCsvPath, params, messages = FALSE, nIterations =
## 1000): GWCoGAPS is deprecated, use the main function CoGAPS with the argument:
## distributed="genome-wide"
## 
## This is CoGAPS version 3.12.0 
## Running genome-wide CoGAPS on /tmp/Rtmpg3vvGB/Rinst2e5403707bc119/CoGAPS/extdata/GIST.csv (1363 genes and 9 samples)
# get the unmatched patterns from each subset
getUnmatchedPatterns(result)
## [[1]]
##      Pattern_1 Pattern_2 Pattern_3
## IM00 0.9993793 0.9097677 0.9736563
## IM02 0.9703290 0.9288179 0.9732262
## IM04 0.9272981 0.9277011 0.9553072
## IM06 0.9185898 0.9417509 0.9627476
## IM09 0.8506566 0.9545942 0.9597259
## IM12 0.8126645 0.9773247 0.9909174
## IM18 0.7725924 0.9859086 0.9954400
## IM24 0.7141643 0.9980845 0.9786490
## IM48 0.7108381 0.9928724 0.9823740
## 
## [[2]]
##      Pattern_1 Pattern_2  Pattern_3
## IM00 0.9984941 0.9888684 0.02495895
## IM02 0.9947621 0.9797648 0.08237066
## IM04 0.9688219 0.9278817 0.48727626
## IM06 0.9703985 0.9502550 0.55515689
## IM09 0.9592479 0.9384234 0.74564928
## IM12 0.9656433 0.9836166 0.68625373
## IM18 0.9562685 0.9831328 0.86525029
## IM24 0.9388248 0.9867272 0.99934393
## IM48 0.9389876 0.9899414 0.86262602
# get the clustered patterns from the set of all patterns
getClusteredPatterns(result)
## $`1`
##            1.1       2.2
## IM00 0.9993793 0.9984941
## IM02 0.9703290 0.9947621
## IM04 0.9272981 0.9688219
## IM06 0.9185898 0.9703985
## IM09 0.8506566 0.9592479
## IM12 0.8126645 0.9656433
## IM18 0.7725924 0.9562685
## IM24 0.7141643 0.9388248
## IM48 0.7108381 0.9389876
## 
## $`2`
##            2.1        2.3
## IM00 0.9097677 0.02495895
## IM02 0.9288179 0.08237066
## IM04 0.9277011 0.48727626
## IM06 0.9417509 0.55515689
## IM09 0.9545942 0.74564928
## IM12 0.9773247 0.68625373
## IM18 0.9859086 0.86525029
## IM24 0.9980845 0.99934393
## IM48 0.9928724 0.86262602
## 
## $`3`
##            1.2       1.3
## IM00 0.9736563 0.9888684
## IM02 0.9732262 0.9797648
## IM04 0.9553072 0.9278817
## IM06 0.9627476 0.9502550
## IM09 0.9597259 0.9384234
## IM12 0.9909174 0.9836166
## IM18 0.9954400 0.9831328
## IM24 0.9786490 0.9867272
## IM48 0.9823740 0.9899414
# get the correlation of each pattern to the cluster mean
getCorrelationToMeanPattern(result)
## $`1`
## [1] 0.999 0.961
## 
## $`2`
## [1] 0.925 0.999
## 
## $`3`
## [1] 0.929 0.978
# get the size of the subsets used
sapply(getSubsets(result), length)
## [1] 681 682

5.4.4 Manual Pipeline

CoGAPS allows for a custom process for matching the patterns together. If you have a result object from a previous run of GWCoGAPS/scCoGAPS, the unmatched patterns for each subset are found by calling getUnmatchedPatterns. Apply any method you like as long as the result is a matrix with the number of rows equal to the number of samples (genes) and the number of columns is equal to the number of patterns. Then pass the matrix to the fixedPatterns argument along with the original parameters for the GWCoGAPS/scCoGAPS run.

# initial run
result <- GWCoGAPS(GISTCsvPath, messages=FALSE, nIterations=1000)
## Warning in GWCoGAPS(GISTCsvPath, messages = FALSE, nIterations = 1000):
## GWCoGAPS is deprecated, use the main function CoGAPS with the argument:
## distributed="genome-wide"
## 
## This is CoGAPS version 3.12.0 
## Running genome-wide CoGAPS on /tmp/Rtmpg3vvGB/Rinst2e5403707bc119/CoGAPS/extdata/GIST.csv (1363 genes and 9 samples)
# custom matching process (just take matrix from first subset as a dummy)
consensusMatrix <- getUnmatchedPatterns(result)[[1]]

# run with our custom matched patterns matrix
params <- CogapsParams()
params <- setFixedPatterns(params, consensusMatrix, 'P')
GWCoGAPS(GISTCsvPath, params, explicitSets=getSubsets(result), nIterations=1000)
## Warning in GWCoGAPS(GISTCsvPath, params, explicitSets = getSubsets(result), :
## GWCoGAPS is deprecated, use the main function CoGAPS with the argument:
## distributed="genome-wide"
## 
## This is CoGAPS version 3.12.0 
## Running genome-wide CoGAPS on /tmp/Rtmpg3vvGB/Rinst2e5403707bc119/CoGAPS/extdata/GIST.csv (1363 genes and 9 samples) with parameters:
## 
## -- Standard Parameters --
## nPatterns            7 
## nIterations          1000 
## seed                 844 
## sparseOptimization   FALSE 
## distributed          genome-wide 
## 
## -- Sparsity Parameters --
## alpha          0.01 
## maxGibbsMass   100 
## 
## -- Distributed CoGAPS Parameters -- 
## nSets          4 
## cut            7 
## minNS          2 
## maxNS          6 
## 
## Creating subsets...using provided named subsets
## set sizes (min, mean, max): (340, 340.75, 343)
## Running Final Stage...
## [1] "CogapsResult object with 1363 features and 9 samples"
## [1] "7 patterns were learned"

6 sessionInfo()

sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] BiocParallel_1.26.0 CoGAPS_3.12.0       BiocStyle_2.20.0   
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.6                  highr_0.9                  
##  [3] bslib_0.2.5.1               compiler_4.1.0             
##  [5] RColorBrewer_1.1-2          BiocManager_1.30.15        
##  [7] jquerylib_0.1.4             GenomeInfoDb_1.28.0        
##  [9] XVector_0.32.0              rhdf5filters_1.4.0         
## [11] MatrixGenerics_1.4.0        bitops_1.0-7               
## [13] tools_4.1.0                 zlibbioc_1.38.0            
## [15] SingleCellExperiment_1.14.0 digest_0.6.27              
## [17] rhdf5_2.36.0                lattice_0.20-44            
## [19] jsonlite_1.7.2              evaluate_0.14              
## [21] rlang_0.4.11                Matrix_1.3-3               
## [23] DelayedArray_0.18.0         magick_2.7.2               
## [25] yaml_2.2.1                  parallel_4.1.0             
## [27] xfun_0.23                   GenomeInfoDbData_1.2.6     
## [29] cluster_2.1.2               stringr_1.4.0              
## [31] knitr_1.33                  caTools_1.18.2             
## [33] gtools_3.8.2                sass_0.4.0                 
## [35] S4Vectors_0.30.0            IRanges_2.26.0             
## [37] grid_4.1.0                  stats4_4.1.0               
## [39] Biobase_2.52.0              R6_2.5.0                   
## [41] rmarkdown_2.8               bookdown_0.22              
## [43] Rhdf5lib_1.14.0             magrittr_2.0.1             
## [45] gplots_3.1.1                htmltools_0.5.1.1          
## [47] matrixStats_0.58.0          BiocGenerics_0.38.0        
## [49] GenomicRanges_1.44.0        SummarizedExperiment_1.22.0
## [51] KernSmooth_2.23-20          stringi_1.6.2              
## [53] RCurl_1.98-1.3

7 Citing CoGAPS

If you use the CoGAPS package for your analysis, please cite Fertig et al. (2010)

If you use the gene set statistic, please cite Ochs et al. (2009)

References

Fertig, Elana J., Jie Ding, Alexander V. Favorov, Giovanni Parmigiani, and Michael F. Ochs. 2010. “CoGAPS: An R/C++ Package to Identify Patterns and Biological Process Activity in Transcriptomic Data.” Bioinformatics 26 (21): 2792–3. https://doi.org/10.1093/bioinformatics/btq503.

Ochs, Michael F., Lori Rink, Chi Tarn, Sarah Mburu, Takahiro Taguchi, Burton Eisenberg, and Andrew K. Godwin. 2009. “Detection of Treatment-Induced Changes in Signaling Pathways in Gastrointestinal Stromal Tumors Using Transcriptomic Data.” Cancer Research 69 (23): 9125–32. https://doi.org/10.1158/0008-5472.CAN-09-1709.

Seung, Sebastian, and Daniel D. Lee. 1999. “Learning the Parts of Objects by Non-Negative Matrix Factorization.” Nature 401 (6755): 788–91. https://doi.org/10.1038/44565.

Stein-O’Brien, Genevieve L., Jacob L. Carey, Wai S. Lee, Michael Considine, Alexander V. Favorov, Emily Flam, Theresa Guo, et al. 2017. “PatternMarkers & Gwcogaps for Novel Data-Driven Biomarkers via Whole Transcriptome Nmf.” Bioinformatics 33 (12): 1892–4. https://doi.org/10.1093/bioinformatics/btx058.