Calculate ssgsea score by hand

614 Views Asked by At

I hope this is the right place for this question, if not, feel free to suggest more suitable sites.

I would like to calculate a single sample gene set enrichment analysis (ssGSEA) score step by step by using simplified test data. I'm able to do it in R by using the GSVA-package, but I don't get the same result when I use the equations provided in the original article.

Test data:

genetable <- matrix(c(5.5,6.3,5.4,7.3,5.9,6.8, 11.2, 6.4, 8.2,7.5), ncol=1)

colnames(genetable) <- paste0("Pas", c(1:ncol(genetable)))

rownames(genetable) <- paste0("Gene", c(1:nrow(genetable)))

Test genelists:

genelist <- list(c("Gene1", "Gene3", "Gene5", "Gene7"),
                 c("Gene2", "Gene4", "Gene8"))

names(genelist) <- c("GenesetA", "GenesetB")

Running it in GSVA, using "ssGSEA" as method:

GSVA::gsva(expr=genetable,
                        gset.idx.list= genelist,
                        mx.diff=FALSE, method="ssgsea", 
                        verbose=TRUE,
                        ssgsea.norm=F,
                        min.sz = 1,
                        tau=0.25)

Result

"Geneset"   "Pas1"
"GenesetA"  -1.748
"GenesetB"  -0.166

The calculation is based on the method described here: https://www.nature.com/articles/nature08460#online-methods

According to my understanding, the parameters will in the calculation of ssGSEA for GenesetA be $N= 10$ and $NG = 4$. However, I'm struggling with getting the other numbers in place.

It is quite a while since I had to deal with (slightly more complex) equations, so if someone can help me with this exact example, it would be highly appreciated! If needed, I can post what I've done so far, but it is quite messy, so I skip it right now :)


Screenshot of equations:
Screenshot of equations

1

There are 1 best solutions below

1
On

Here is one way in R.

library(dplyr)

S <- genetable[,1] %>%
  rank() %>%
  sort(decreasing = TRUE)
G <- names(S) %in% genelist[[1]]

ind_G <- which(G)
PG <- rep(0, length.out = length(S))
PG[ind_G] <- abs(S[ind_G]) ^ 0.25
PG <- PG / sum(PG)
PG <- cumsum(PG)

ind_NG <- which(!G)
PNG <- rep(0, length.out = length(S))
PNG[ind_NG] <- 1
PNG <- PNG / sum(PNG)
PNG <- cumsum(PNG)

sum(PG) - sum(PNG)
# -1.747631