From a mathematical intuition perspective, how does OLS "hold everything constant?"
The formula for beta coefficients of an OLS model is: $$\beta=(X'X)^{-1}X'Y$$ Source
I see that the above formulation does isolate the impact of a single independent variable based on this post, but I don't see where in the above notation that isolation process occurs.
My attempt at intuitively understanding the formula steps are the following. Note that I included letters A,B,C,D to avoid rewriting solutions from prior steps:
- $A = (X'X)$: create a covariance matrix for X
- $B = A^{-1}$: invert the matrix so that we "divide"
- $C = BX'$: "divide" X by its covariance
- $D = CY$: scale by Y
The final output seems a lot like $cov/var$ (which is another formulation of the beta coefficients), but I still don't see how we're holding anything constant. Are the above steps right? And, how does the formula account for other independent variables and "control for them"?
PS I made a post about the subject and would love to be able update it with this information.
I'm afraid I don't have a great answer, in that my answer will ultimately boil down to ``matrix inverses are a black box for how to recover coefficients of an expansion,'' which is what finding the $\vec{\beta}$ in regression is.
But I do have some intuition for why the matrices $G=(X'X)^{-1}X'$ and $H=X(X'X)^{-1}X'$ are coefficient extractors. The matrix $H$ is the orthogonal projection onto the column space of $X$, and $G$ (which is one step before) extracts the coefficients which are then applied to the columns of $X$ to rebuild a vector from its coefficients. Let $\vec{x}_i$ be the $i^{th}$ column of $X$. Then $X(X'X)^{-1}X'\vec{x}_i$ is the $i^{th}$ column of $X(X'X)^{-1}X'X$, which is $\vec{x}_i$ again. Pulling back one step, $(X'X)^{-1}X'\vec{x}_i$ is the column vector with a $1$ in the $i^{th}$ row and $0$ elsewhere.
Using linearity, if $\vec{x}=\sum_{i=1}^p \beta_i \vec{x}_i$, then $$G\vec{x}=\begin{pmatrix} \beta_1 \\ \beta_2 \\ \vdots \\ \beta_p\end{pmatrix}.$$ So this matrix $G$ extracts the coefficients.
Of course, if $\vec{x}_\perp$ is orthogonal to the column space of $X$, this is the same as saying that $\vec{x}_\perp \cdot \vec{x}_i=0$ for all $1\leqslant i\leqslant p$. Since these dot products are exactly the values in $X'\vec{x}_\perp$, $G$ and $H$ send $\vec{x}_\perp$ to $0$. So $H$ leaves all members of the column space of $X$ fixed, and sends everything in the orthogonal complement of the column space of $X$ to $0$, which means that $H$ is exactly the orthogonal projection onto the column space of $X$.
https://rmcausey.files.wordpress.com/2019/10/the-beauty-of-squares2-1.pdf