Run Gene overrepresentation analysis with topGO

This function uses the Kolmogorov-Smirnov test as implemented by the package topGO to test for overrepresentation in Gene Ontology gene sets.

Usage

apl_topGO(
  caobj,
  ontology,
  organism = "hs",
  ngenes = 1000,
  score_cutoff = 0,
  use_coords = FALSE,
  return_plot = FALSE,
  top_res = 15
)

Arguments

caobj: A "cacomp" object with principal row coordinates and standardized column coordinates calculated.
ontology: Character string. Chooses GO sets for 'BP' (biological processes), 'CC' (cell compartment) or 'MF' (molecular function).
organism: Character string. Either 'hs' (homo sapiens), 'mm' (mus musculus) or the name of the organism package such as 'org.*.eg.db'.
ngenes: Numeric. Number of top ranked genes to test for overrepresentation.
score_cutoff: numeric. S-alpha score cutoff. Only genes with a score larger will be tested.
use_coords: Logical. Whether the x-coordinates of the row APL coordinates should be used for ranking. Only recommended when no S-alpha score (see apl_score()) can be calculated.
return_plot: Logical. Whether a plot of significant gene sets should be additionally returned.
top_res: Numeric. Number of top scoring genes to plot.

Value

A data.frame containing the gene sets with the highest overrepresentation.

Details

For a chosen group of cells/samples, the top 'ngenes' group specific genes are used for gene overrepresentation analysis. The genes are ranked either by the precomputed APL score, or, if not available by their APL x-coordinates.

References

Adrian Alexa and Jorg Rahnenfuhrer
topGO: Enrichment Analysis for Gene Ontology.
R package version 2.42.0.

Examples

library(SeuratObject)
#> Loading required package: sp
#> ‘SeuratObject’ was built under R 4.4.0 but the current version is
#> 4.4.2; it is recomended that you reinstall ‘SeuratObject’ as the ABI
#> for R may have changed
#> 
#> Attaching package: ‘SeuratObject’
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, t
set.seed(1234)
cnts <- SeuratObject::LayerData(pbmc_small, assay = "RNA", layer = "counts")
cnts <- as.matrix(cnts)

# Run CA on example from Seurat

ca <- cacomp(pbmc_small,
             princ_coords = 3,
             return_input = FALSE,
             assay = "RNA",
             slot = "counts")
#> Warning: 
#> Parameter top is >nrow(obj) and therefore ignored.
#> No dimensions specified. Setting dimensions to: 15

grp <- which(Idents(pbmc_small) == 2)
ca <- apl_coords(ca, group = grp)
ca <- apl_score(ca,
                mat = cnts)
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |======================================================================| 100%

enr <- apl_topGO(ca,
                 ontology = "BP",
                 organism = "hs")
#> 
#> groupGOTerms: 	GOBPTerm, GOMFTerm, GOCCTerm environments built.
#> Loading required package: org.Hs.eg.db
#> Loading required package: AnnotationDbi
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> 
#> Attaching package: ‘BiocGenerics’
#> The following object is masked from ‘package:SeuratObject’:
#> 
#>     intersect
#> The following objects are masked from ‘package:stats’:
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, saveRDS, setdiff, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Loading required package: IRanges
#> Loading required package: S4Vectors
#> 
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#> 
#>     findMatches
#> The following objects are masked from ‘package:base’:
#> 
#>     I, expand.grid, unname
#> 
#> Attaching package: ‘IRanges’
#> The following object is masked from ‘package:sp’:
#> 
#>     %over%
#> 
#> Building most specific GOs .....
#> 	( 1348 GO terms found. )
#> 
#> Build GO DAG topology ..........
#> 	( 3594 GO terms and 7817 relations. )
#> 
#> Annotating nodes ...............
#> 	( 207 genes annotated to the GO terms. )
#> 
#> 			 -- Elim Algorithm -- 
#> 
#> 		 the algorithm is scoring 519 nontrivial nodes
#> 		 parameters: 
#> 			 test statistic: fisher
#> 			 cutOff: 0.01
#> 
#> 	 Level 12:	1 nodes to be scored	(0 eliminated genes)
#> 
#> 	 Level 11:	7 nodes to be scored	(0 eliminated genes)
#> 
#> 	 Level 10:	17 nodes to be scored	(8 eliminated genes)
#> 
#> 	 Level 9:	24 nodes to be scored	(11 eliminated genes)
#> 
#> 	 Level 8:	53 nodes to be scored	(17 eliminated genes)
#> 
#> 	 Level 7:	74 nodes to be scored	(19 eliminated genes)
#> 
#> 	 Level 6:	103 nodes to be scored	(27 eliminated genes)
#> 
#> 	 Level 5:	101 nodes to be scored	(27 eliminated genes)
#> 
#> 	 Level 4:	74 nodes to be scored	(27 eliminated genes)
#> 
#> 	 Level 3:	50 nodes to be scored	(27 eliminated genes)
#> 
#> 	 Level 2:	14 nodes to be scored	(27 eliminated genes)
#> 
#> 	 Level 1:	1 nodes to be scored	(27 eliminated genes)

plot_enrichment(enr)