Proving the Mulitple Coefficient of Determination Formula (correlated explanatory variables)

Question

Proving the Mulitple Coefficient of Determination Formula (correlated explanatory variables)

88 Views Asked by Bumbble Comm At 26 Mar 2026 - 10:14

I stumbled upon the following formula for the coefficient of determination:

$$1-R_{y(x_1,x_2...x_n)}^2=\left(1-\rho_{y,x_1}^2\right)\left(1-\rho_{y,x_2(x_1)}^2\right)\left(1-\rho_{y,x_3(x_1,x_2)}^2\right)\,\cdots\,\left(1-\rho_{y,x_n(x_1,x_2...x_{n-1})}^2\right).$$

where $R_{y(x_1,x_2,...,x_n)}$ is the coefficient of determination associated with the multiple linear regression between $y$ and ${x_1,x_2,...,x_n}$ and $\rho_{y,x_p(x_1,x_2,...,x_{p-1})}$ is the partial correlation between $y$ and $x_p$ controlling for $x_1,x_2,...,x_{p-1}$. Although this intuitively makes sense, would anyone have a proof of this formula?

I had a go by starting with the regression model: $$y=\mathbf{\beta}^T\mathbf{x}+\epsilon$$ where $\mathbf{\beta}$ is the vector of regression coefficients and $\epsilon$ is the error term. Then $$1-R_{y(x_1,x_2,...,x_n)}^2=E[\epsilon^2]/\sigma_y^2$$ where $\sigma_y^2$ is the variance of y. One can then try look at: $$1-R_{y(x_1,x_2,...,x_n)}=\frac{1}{\sigma_y^2}E[(y-\mathbf{\beta}\mathbf{x})^T(y-\mathbf{\beta}\mathbf{x})]$$ by rewriting $\mathbf{\beta}$ in terms of the correlations and standard deviations between the explanatory variables $\mathbf{x}$ and dependent variable $y$. However this seems very long winded especially as then the result would have to be refactored in terms of the partial correlations which the final formula has. So I was wondering if anyone knows a better, perhaps recursive/inductive approach, starting with 1 variable regression and adding more.

Thanks for the help!

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2021-12-16 05:17:12

I think I managed a proof by induction, let me know what you think.

Statement to prove for all positive integers $n$: $$1-R_{y(x_1,x_2...x_n)}^2=\left(1-\rho_{y,x_1}^2\right)\left(1-\rho_{y,x_2(x_1)}^2\right)\left(1-\rho_{y,x_3(x_1,x_2)}^2\right)\,\cdots\,\left(1-\rho_{y,x_n(x_1,x_2...x_{n-1})}^2\right).$$

Prove for $n=1$. This is relatively straightforward and has been done here before: Correlation Coefficient and Determination Coefficient
Assume true for $n=k$: $$1-R_{y(x_1,x_2...x_k)}^2=\left(1-\rho_{y,x_1}^2\right)\left(1-\rho_{y,x_2(x_1)}^2\right)\left(1-\rho_{y,x_3(x_1,x_2)}^2\right)\,\cdots\,\left(1-\rho_{y,x_n(x_1,x_2...x_{k-1})}^2\right).$$
Then attempt to prove for $n=k+1$. Start with the regression model: $$y=\sum_i^n\beta_{y,x_i(\{x\}_{k+1}\setminus x_i)}x_i+\epsilon_{k+1}$$ where $\{x\}_{a}$ is the set of $a$ independent variables. Therefore: $$y=\sum_i^k\left(\beta_{y,x_i(\{x\}_{k+1}\setminus \{x_i\})}x_i\right)+\beta_{y,x_{k+1}(\{x\}_k)}x_{k+1}+\epsilon_{k+1}$$

$$y=\sum_i^k\left(\beta_{y,x_i(\{x\}_{k+1}\setminus \{x_i\})}x_i\right)+\beta_{y,x_{k+1}(\{x\}_k)}\left(x_{k+1}-\sum_i^k\beta_{x_{k+1}x_i(\{x\}_k\setminus x_i)}x_i\right)+\beta_{y,x_{k+1}(\{x\}_k)}\sum_i^k\beta_{x_{k+1}x_i(\{x\}_k\setminus x_i)}x_i+\epsilon_{k+1}$$ $$y-\sum_i^k\left(\beta_{y,x_i(\{x\}_{k+1}\setminus \{x_i\})}x_i\right)-\beta_{y,x_{k+1}(\{x\}_k)}\sum_i^k\beta_{x_{k+1}x_i(\{x\}_k\setminus x_i)}x_i=\beta_{y,x_{k+1}(\{x\}_k)}\left(x_{k+1}-\sum_i^k\beta_{x_{k+1}x_i(\{x\}_k\setminus x_i)}x_i\right)+\epsilon_{k+1} $$ $$\epsilon_k=\beta_{y,x_{k+1}(\{x\}_k)}\left(x_{k+1}-\sum_i^k\beta_{x_{k+1}x_i(\{x\}_k\setminus x_i)}x_i\right)+\epsilon_{k+1}$$

where we have used

$$\sum_i^k\left(\beta_{y,x_i(\{x\}_{k+1}\setminus \{x_i\})}x_i\right)+\beta_{y,x_{k+1}(\{x\}_k)}\sum_i^k\beta_{x_{k+1}x_i(\{x\}_k\setminus x_i)}x_i=\sum_i^k\left(\beta_{y,x_i(\{x\}_{k}\setminus \{x_i\})}x_i\right)$$

This is an intuitive result but probably can be proved by induction as well. We have essentially collapsed the $n=k+1$ regression model into the $n=1$ regression model using the error terms. Therefore using a similar proof to 1. we can see that

$$E[\epsilon_{k+1}^2]=E[\epsilon_k^2](1-\rho_{y,x_k+1(x_1,x_2,...,x_k)})$$

Using the relation between the determination coefficient and the error terms we get:

$$1-R_{y(x_1,x_2...x_{k+1})}^2=(1-R_{y(x_1,x_2...x_{k})}^2)(1-\rho_{y,x_k+1(x_1,x_2,...,x_k)})$$

Then, using 2. on the RHS, we get the desired result:

$$1-R_{y(x_1,x_2...x_{k+1})}^2=\left(1-\rho_{y,x_1}^2\right)\left(1-\rho_{y,x_2(x_1)}^2\right)\left(1-\rho_{y,x_3(x_1,x_2)}^2\right)\,\cdots\,\left(1-\rho_{y,x_k+1(x_1,x_2,...,x_k)}\right)$$

Hence we have proven that if true for $n=k$, then the statement is true for $n=k+1$ and since it is true for $n=1$, it is true for all positive integers $n$.

Proving the Mulitple Coefficient of Determination Formula (correlated explanatory variables)

There are 1 best solutions below

Related Questions in CORRELATION

Related Questions in REGRESSION-ANALYSIS

Trending Questions

Popular # Hahtags

Popular Questions