Find rows most highly associated with a condition

Ranks rows by a calculated score which balances the association of the row with the condition and how associated it is with other conditions.

Usage

apl_score(
  caobj,
  mat = NULL,
  dims = caobj@dims,
  group = caobj@group,
  reps = 10,
  quant = 0.99,
  python = FALSE,
  store_perm = TRUE,
  method = "permutation"
)

Arguments

caobj: A "cacomp" object with principal row coordinates and standardized column coordinates calculated.
mat: A numeric matrix. For sequencing a count matrix, gene expression values with genes in rows and samples/cells in columns. Should contain row and column names.
dims: Integer. Number of CA dimensions to retain. Needs to be the same as in caobj!
group: Vector of indices of the columns to calculate centroid/x-axis direction.
reps: Integer. Number of permutations to perform.
quant: Numeric. Single number between 0 and 1 indicating the quantile used to calculate the cutoff. Default 0.99.
python: DEPRACTED. A logical value indicating whether to use singular-value decomposition from the python package torch.
store_perm: Logical. Whether permuted data should be stored in the CA object. This implementation dramatically speeds up computation compared to `svd()` in R.
method: Method to calculate the cutoff. Either "random" for random direction method or "permutation" for the permutation method.

Value

Returns the input "cacomp" object with "APL_score" component added. APL_score contains a data frame with ranked rows, their score and their original row number.

Details

The score is calculated by permuting the values of each row to determine the cutoff angle of the 99 $$S_{alpha}(x,y)=x-\frac{y}{\tan\alpha}$$ By default the permutation is repeated 10 times (for random direction min. 300 repetition is recommended!), but for very large matrices this can be reduced. The method "permutation" permutes the columns in each row and calculates AP-coordinates for each such permutation. The cutoff is then taken by the quantile specified by "quan". The "random" method in contrast calculates AP-coordinates for the original data, but by looking into random directions.

If store_perm is TRUE the permuted data is stored in the cacomp object and can be used for future scoring.

References

Association Plots: Visualizing associations in high-dimensional correspondence analysis biplots
Elzbieta Gralinska, Martin Vingron
bioRxiv 2020.10.23.352096; doi: https://doi.org/10.1101/2020.10.23.352096

Examples

set.seed(1234)

# Simulate counts
cnts <- mapply(function(x){rpois(n = 500, lambda = x)},
               x = sample(1:20, 50, replace = TRUE))
rownames(cnts) <- paste0("gene_", 1:nrow(cnts))
colnames(cnts) <- paste0("cell_", 1:ncol(cnts))

# Run correspondence analysis.
ca <- cacomp(obj = cnts, princ_coords = 3)
#> Warning: 
#> Parameter top is >nrow(obj) and therefore ignored.
#> No dimensions specified. Setting dimensions to: 9

# Calculate APL coordinates:
ca <- apl_coords(ca, group = 1:10)

# Rank genes by S-alpha score
ca <- apl_score(ca, mat = cnts)
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |======================================================================| 100%