Contents

Introduction

This study explores expression profiles of basal stem-cell enriched cells (B) and committed luminal cells (L) of the mammary gland of virgin, pregnant and lactating mice [1]. Six groups are compared corresponding to a combination of cell type and mouse status. Each group contains two biological replicates. Sequences and counts datasets are publicly available from the Gene Expression Omnibus (GEO) with the serie accession number GSE60450. The RNA-Seq data is analysed by edgeR development team [2].

As described in edgeR user guide, we tested for significant differential expression in gene, using the QL F-test [3]. Here, we focused in gene expression analysis in only one type of cell, i.e. luminal cells for the three developmental status (virgin, pregnant and lactating mice). Among the 15,804 expressed genes, we obtained 7,302 significantly differentially expressed (DE) genes for the comparison virgin versus pregnant, 7,699 for the comparison pregnant versus lactate, and 9,583 for the comparison virgin versus lactate, with Benjamini-Hochberg correction to control the false discovery rate at 5%.

Data

Vignette build convenience (for less build time and package size) need that data were pre-calculated (provided by the package), and that illustrations were not interactive.

# load vignette data
data(
    myGOs,
    package="ViSEAGO"
)

1 Genes of interest

We load examples files from ViSEAGO package using system.file from the locally installed package. We read the gene identifiers for the background (= expressed genes) file and the three lists of DE genes of the study. Here, gene identifiers are GeneID from EntrezGene database.

# load genes identifiants (GeneID,ENS...) background (expressed genes) 
background<-scan(
    system.file(
        "extdata/data/input",
        "background_L.txt",
        package = "ViSEAGO"
    ),
    quiet=TRUE,
    what=""
)

# load Differentialy Expressed (DE) gene identifiants from lists
PregnantvsLactateDE<-scan(
    system.file(
        "extdata/data/input",
        "pregnantvslactateDE.txt",
        package = "ViSEAGO"
    ),
    quiet=TRUE,
    what=""
)

VirginvsLactateDE<-scan(
    system.file(
        "extdata/data/input",
        "virginvslactateDE.txt",
        package = "ViSEAGO"
    ),
    quiet=TRUE,
    what=""
)

VirginvsPregnantDE<-scan(
    system.file(
        "extdata/data/input",
        "virginvspregnantDE.txt",
        package = "ViSEAGO"
    ),
    quiet=TRUE,
    what=""
)

Here, we display the first 6 GeneID from the PregnantvsLactateDE list.

2 GO annotation of genes

In this study, we build a myGENE2GO object using the Bioconductor org.Mm.eg.db database package for the mouse species. This object contains all available GO annotations for categories Molecular Function (MF), Biological Process (BP), and Cellular Component (CC).

NB: Don’t forget to check if the last current annotation database version is installed in your R session! See ViSEAGO::available_organisms(Bioconductor).

# connect to Bioconductor
Bioconductor<-ViSEAGO::Bioconductor2GO()

# load GO annotations from Bioconductor
myGENE2GO<-ViSEAGO::annotate(
    "org.Mm.eg.db",
    Bioconductor
)


# display summary
myGENE2GO


- object class: gene2GO
- database: Bioconductor
- stamp/version: 2019-Jul10
- organism id: org.Mm.eg.db

GO annotations:
- Molecular Function (MF): 22707 annotated genes with 91986 terms (4121 unique terms)
- Biological Process (BP): 23210 annotated genes with 164825 terms (12224 unique terms)
- Cellular Component (CC): 23436 annotated genes with 107852 terms (1723 unique terms)

3 Functional GO enrichment

3.1 GO enrichment tests

We perform a functional Gene Ontology (GO) enrichment analysis from differentially expressed (DE) genes of luminal cells in the mammary gland. The enriched Biological process (BP) are obtained using a Fisher’s exact test with elim algorithm developped in topGO package.

First, we create three topGOdata objects, using ViSEAGO::create_topGOdata method, corresponding to the three DE genes lists for the comparison virgin versus pregnant, pregnant versus lactate, and virgin versus lactate. The gene background corresponding to expressed genes in luminal cells of the mammary gland and GO annotations are also provided.

# create topGOdata for BP for each list of DE genes
BP_PregnantvsLactate<-ViSEAGO::create_topGOdata(
    geneSel=PregnantvsLactateDE,
    allGenes=background,
    gene2GO=myGENE2GO, 
    ont="BP",
    nodeSize=5
)

BP_VirginvsLactate<-ViSEAGO::create_topGOdata(
    geneSel=VirginvsLactateDE,
    allGenes=background,
    gene2GO=myGENE2GO,
    ont="BP",
    nodeSize=5
)

BP_VirginvsPregnant<-ViSEAGO::create_topGOdata(
    geneSel=VirginvsPregnantDE,
    allGenes=background,
    gene2GO=myGENE2GO,
    ont="BP",
    nodeSize=5
)

Now, we perform the GO enrichment tests for BP category with Fisher’s exact test and elim algorithm using topGO::runTest method.

NB: p-values of enriched GO terms are not adjusted and considered significant if below 0.01.

# perform topGO tests
elim_BP_PregnantvsLactate<-topGO::runTest(
    BP_PregnantvsLactate,
    algorithm ="elim",
    statistic = "fisher",
    cutOff=0.01
)

elim_BP_VirginvsLactate<-topGO::runTest(
    BP_VirginvsLactate,
    algorithm ="elim",
    statistic = "fisher",
    cutOff=0.01
)

elim_BP_VirginvsPregnant<-topGO::runTest(
    BP_VirginvsPregnant,
    algorithm ="elim",
    statistic = "fisher",
    cutOff=0.01
)

3.2 Combine enriched GO terms

We combine the results of the three enrichment tests into an object using ViSEAGO::merge_enrich_terms method. A table of enriched GO terms in at least one comparison is displayed in interactive mode, or printed in a file using ViSEAGO::show_table method. The printed table contains for each enriched GO terms, additional columns including the list of significant genes and frequency (ratio between the number of significant genes and number of background genes in a specific GO tag) evaluated by comparison.

# merge topGO results
BP_sResults<-ViSEAGO::merge_enrich_terms(
    cutoff=0.01,
    Input=list(
        PregnantvsLactate=c(
            "BP_PregnantvsLactate",
            "elim_BP_PregnantvsLactate"
        ),
        VirginvsLactate=c(
            "BP_VirginvsLactate",
            "elim_BP_VirginvsLactate"
        ),
        VirginvsPregnant=c(
            "BP_VirginvsPregnant",
            "elim_BP_VirginvsPregnant"
        )
    )
)


# display a summary
BP_sResults


- object class: enrich_GO_terms
- ontology: BP
- method: topGO
- summary:PregnantvsLactate
      BP_PregnantvsLactate 
        description: Bioconductor org.Mm.eg.db 2019-Jul10
        available_genes: 15804
        available_genes_significant: 7699
        feasible_genes: 14091
        feasible_genes_significant: 7044
        genes_nodeSize: 5
        nodes_number: 8463
        edges_number: 19543
      elim_BP_PregnantvsLactate 
        description: Bioconductor org.Mm.eg.db 2019-Jul10 
        test_name: fisher p<0.01
        algorithm_name: elim
        GO_scored: 8463
        GO_significant: 199
        feasible_genes: 14091
        feasible_genes_significant: 7044
        genes_nodeSize: 5
        Nontrivial_nodes: 8433 
 VirginvsLactate
      BP_VirginvsLactate 
        description: Bioconductor org.Mm.eg.db 2019-Jul10
        available_genes: 15804
        available_genes_significant: 9583
        feasible_genes: 14091
        feasible_genes_significant: 8734
        genes_nodeSize: 5
        nodes_number: 8463
        edges_number: 19543
      elim_BP_VirginvsLactate 
        description: Bioconductor org.Mm.eg.db 2019-Jul10 
        test_name: fisher p<0.01
        algorithm_name: elim
        GO_scored: 8463
        GO_significant: 152
        feasible_genes: 14091
        feasible_genes_significant: 8734
        genes_nodeSize: 5
        Nontrivial_nodes: 8457 
 VirginvsPregnant
      BP_VirginvsPregnant 
        description: Bioconductor org.Mm.eg.db 2019-Jul10
        available_genes: 15804
        available_genes_significant: 7302
        feasible_genes: 14091
        feasible_genes_significant: 6733
        genes_nodeSize: 5
        nodes_number: 8463
        edges_number: 19543
      elim_BP_VirginvsPregnant 
        description: Bioconductor org.Mm.eg.db 2019-Jul10 
        test_name: fisher p<0.01
        algorithm_name: elim
        GO_scored: 8463
        GO_significant: 243
        feasible_genes: 14091
        feasible_genes_significant: 6733
        genes_nodeSize: 5
        Nontrivial_nodes: 8413 
 
- enrichment pvalue cutoff:
        PregnantvsLactate : 0.01
        VirginvsLactate : 0.01
        VirginvsPregnant : 0.01
- enrich GOs (in at least one list): 521 GO terms of 3 conditions.
        PregnantvsLactate : 199 terms
        VirginvsLactate : 152 terms
        VirginvsPregnant : 243 terms


# show table in interactive mode
ViSEAGO::show_table(BP_sResults)

bioconductor_table1

3.3 Graphs of GO enrichment tests

Several graphs summarize important features. An interactive barchart showing the number of GO terms enriched or not in each comparison, using ViSEAGO::GOcount method. Intersections of lists of enriched GO terms between comparisons are displayed in an Upset plot using ViSEAGO::Upset method.

# barchart of significant (or not) GO terms by comparison
ViSEAGO::GOcount(BP_sResults)

gocount

# display intersections
ViSEAGO::Upset(
    BP_sResults,
    file="upset.xls"
)

upset

4 GO terms Semantic Similarity

GO annotations of genes created at a previous step and enriched GO terms are combined using ViSEAGO::build_GO_SS method. The Semantic Similarity (SS) between enriched GO terms are calculated using ViSEAGO::compute_SS_distances method which is a wrapper of functions implemented in the GOSemSim package [4]. Here, we choose Wang method based on the topology of GO graph structure. The built object myGOs contains all informations of enriched GO terms and the SS distances between them.

# create GO_SS-class object
myGOs<-ViSEAGO::build_GO_SS(
    gene2GO=myGENE2GO,
    enrich_GO_terms=BP_sResults
)


# compute Semantic Similarity (SS)
myGOs<-ViSEAGO::compute_SS_distances(
    myGOs,
    distance="Wang"
)


# display a summary
myGOs


- object class: GO_SS
- ontology: BP
- method: topGO
- summary:
PregnantvsLactate
      BP_PregnantvsLactate 
        description: Bioconductor org.Mm.eg.db 2019-Jul10
        available_genes: 15804
        available_genes_significant: 7699
        feasible_genes: 14091
        feasible_genes_significant: 7044
        genes_nodeSize: 5
        nodes_number: 8463
        edges_number: 19543
      elim_BP_PregnantvsLactate 
        description: Bioconductor org.Mm.eg.db 2019-Jul10 
        test_name: fisher p<0.01
        algorithm_name: elim
        GO_scored: 8463
        GO_significant: 199
        feasible_genes: 14091
        feasible_genes_significant: 7044
        genes_nodeSize: 5
        Nontrivial_nodes: 8433 
 VirginvsLactate
      BP_VirginvsLactate 
        description: Bioconductor org.Mm.eg.db 2019-Jul10
        available_genes: 15804
        available_genes_significant: 9583
        feasible_genes: 14091
        feasible_genes_significant: 8734
        genes_nodeSize: 5
        nodes_number: 8463
        edges_number: 19543
      elim_BP_VirginvsLactate 
        description: Bioconductor org.Mm.eg.db 2019-Jul10 
        test_name: fisher p<0.01
        algorithm_name: elim
        GO_scored: 8463
        GO_significant: 152
        feasible_genes: 14091
        feasible_genes_significant: 8734
        genes_nodeSize: 5
        Nontrivial_nodes: 8457 
 VirginvsPregnant
      BP_VirginvsPregnant 
        description: Bioconductor org.Mm.eg.db 2019-Jul10
        available_genes: 15804
        available_genes_significant: 7302
        feasible_genes: 14091
        feasible_genes_significant: 6733
        genes_nodeSize: 5
        nodes_number: 8463
        edges_number: 19543
      elim_BP_VirginvsPregnant 
        description: Bioconductor org.Mm.eg.db 2019-Jul10 
        test_name: fisher p<0.01
        algorithm_name: elim
        GO_scored: 8463
        GO_significant: 243
        feasible_genes: 14091
        feasible_genes_significant: 6733
        genes_nodeSize: 5
        Nontrivial_nodes: 8413 
 - enrichment pvalue cutoff:
        PregnantvsLactate : 0.01
        VirginvsLactate : 0.01
        VirginvsPregnant : 0.01
- enrich GOs (in at least one list): 521 GO terms of 3 conditions.
        PregnantvsLactate : 199 terms
        VirginvsLactate : 152 terms
        VirginvsPregnant : 243 terms
- terms distances:  Wang

5 Visualization and interpretation of enriched GO terms

5.1 Multi Dimensional Scaling of GO terms - A preview

A Multi Dimensional Scale (MDS) plot with ViSEAGO::MDSplot method provides a representation of distances among a set of enriched GO terms on the two first dimensions. Some patterns could appear at this time and could be investigated in the interactive mode. The plot could also be printed in a png file.

# MDSplot
ViSEAGO::MDSplot(myGOs)

mds1

5.2 Clustering heatmap of GO terms

To fully explore the results of this functional analysis, a hierarchical clustering using ViSEAGO::GOterms_heatmap is performed based on Wang SS distance between enriched GO terms and ward.D2 aggregation criteria.

Clusters of enriched GO terms are obtained by cutting branches off the dendrogram. Here, we choose a dynamic branch cutting method based on the shape of clusters using dynamicTreeCut [5,6]. Here, we use parameters to detect small clusters in agreement with the dendrogram structure.

Enriched GO terms are ranked in a dendrogram and colored depending on their cluster assignation. Additional illustrations are displayed: the GO description of terms (trimmed if more than 50 characters), a heatmap of -log10(pvalue) of enrichment test for each comparison, and optionally the Information Content (IC). The IC of a GO term is computed by the negative log probability of the term occurring in the GO annotations database of the studied species. A rarely used term contains a greater amount of IC. The illustration is displayed with ViSEAGO::show_heatmap method.

# Create GOterms heatmap
Wang_clusters_wardD2<-ViSEAGO::GOterms_heatmap(
    myGOs,
    showIC=FALSE,
    showGOlabels =FALSE,
    GO.tree=list(
        tree=list(
            distance="Wang",
            aggreg.method="ward.D2"
        ),
        cut=list(
            dynamic=list(
                pamStage=TRUE,
                pamRespectsDendro=TRUE,
                deepSplit=2,
                minClusterSize =2
            )
        )
    ),
    samples.tree=NULL
)


# display the heatmap
ViSEAGO::show_heatmap(
    Wang_clusters_wardD2,
    "GOterms"
)

GOtermsheatmap

A table of enriched GO terms in at least one of the comparison is displayed in interactive mode, or printed in a file using ViSEAGO::show_table method. The columns GO cluster number and IC value are added to the previous table of enriched terms table. The printed table contains for each enriched GO terms, additional columns including the list of significant genes and frequency (ratio between the number of significant genes and number of background genes in a specific GO tag) evaluated by comparison.

# display table
ViSEAGO::show_table(
    Wang_clusters_wardD2
)