I was reading the paper "Consistent Individualized Feature Attribution for Tree Ensembles" by Scott Lundberg et al and cannot understand how the calculation for the $R^2$ works here - see images for the explanation on page 7.
The explanation for the $R^2$ values is:
Figure 6: A quantitative measure of supervised clustering performance. If all samples are placed in their own group, and each group predicts the mean value of the group, then the R2 value (the proportion of model output variance explained) will be 1. If groups are then merged one-by-one the R2 will decline until when there is only a single group it will be 0. Hierarchical clusterings that well separate the model output value will retain a high R2 longer during the merging process. Here supervised clustering with SHAP values outperformed the Sabbas method in both (A) the census data clustering shown in Figure 4, and (B) a clustering from genebased predictions of Alzheimer’s cognitive scores.
From this paper, the authors calculate an $R^2$ value based on the "proportion of model output variance explained". I fully understand how $R^2$ works regarding regression, but I'm not sure what the calculation looks like when used in this context.
Could someone show and explain the calculation based on this description? The paper does not show any calculations and no code.