I've seen two separate geometric linear algebra explanations for the normal equation:
The first time, I say it as minimizing the magnitude of the projection of $y$ onto the column space of $X$.
This interpretation I understood geometrically.
The second I saw here in these two videos (which I don't really understand all too well, but this isn't the point of my question anyway, these are just here for reference in case anyone wants them):
https://www.lem.ma/content/DOgLK1Tw_H8nmjXCEdFbIQ?book_id=DDzhEUQ2gVQffuQD0Tewug
https://www.lem.ma/content/BzgQNTkBGNTQv9wa1PZ8EA?book_id=DDzhEUQ2gVQffuQD0Tewug
However, neither explanation I've seen addresses the fact that $X^TX$ is the covariance matrix...
Geometrically, why does the inverse of the covariance matrix show up in this equation? How does the inverse of the covariance, or the covariance matrix itself, matrix act as a transformation?!
Thanks!

It is the inverse covariance matrix of the data only if the data is centered. However, $\sigma^2(\mathrm{X'X})^{-1}$ is the variance of the OLS vector of the coefficients $\beta$.
"Geometrically, why does the inverse of the covariance matrix show up in this equation?"
You have stated that you understand the geometric perspective of $\hat{Y} = Hy = X(X'X)^{-1}X'y = X\hat{\beta}$ as an orthogonal projection onto the affine space that is spanned by the columns of $\mathrm{X'X}$. Hence, I'll point out the statistic reasoning. Recall the simple linear model $y = \beta_0 + \beta_1x + \epsilon$. In this case, $\hat{\beta}_1$ is the estimator of $\frac{cov(Y,X)}{\sigma_X^2}$. If you center the data, then the model is $y = \beta x + \epsilon$ and the estimator is $\hat{\beta} = \sum y_i x_i/\sum x_i^2$, thus for multivariate case, $\mathrm{(X'X)}^{-1}$ is the generalization of $(\sum x_i^2)^{-1}$ and $X'y$ is the generalization of $\sum y_i x_i$.