I'm trying to get y on a side by itself from the following equation:
$ β = (X^TC^{-1}X)^{-1}X^TC^{-1}y $
where
- $y$ is an $n$ x $1$ vector
- $X$ is an $n$ x $2$ matrix in which the first column contains 1's and the second column contains variables
- $β$ is a 2-parameter vector
- $C$ is an $n$ x $n$ diagonal matrix
- $^T$ indicates the matrix is transposed
I'm using R, and have so far been able to establish that $ X^TC^{-1}y $ is equal to solve(solve(t(X)%*%solve(C)%*%X),B). The result is a 2-parameter vector. However, I can't figure out how to isolate y.
For some context, this is the generalized least squares solution to a phylogenetic regression, where x is the independent variable, y is the dependent variable, C is the variance-covariance matrix of species relatedness, and β are the parameters for the regression.
I apologize if I have butchered the terminology here. I have zero background in linear algebra. Hopefully this is a straightforward question. Let me know if anything needs clarification. Thanks!
UPDATE: I found the function Solve in the R package limSolve, which gives the generalized inverse solution:
Solve(solve(t(X)%*%solve(C)%*%X)%*%(t(X)%*%solve(C)),B)
The resulting calculated $y$ is not the original $y$, but does yield the correct $β$ when plugged back into the original equation. This is certainly progress, but unless I'm missing something, it seems there are many solutions to the problem, and this instance of matrix algebra is a one-way street, unless there is perhaps some way to externally impose additional limits.
We are faced with a general linear model, with some covariance matrix $C$, which is not the identity matrix: $y_i=\beta_1 + \beta_2 x_i + \varepsilon_i, \ i=1, \dots, n$, and $\sigma^2 C$ is the covariance matrix of the multivariate random variable $\varepsilon$. Here, $\beta = (\beta_1; \beta_2)^T$.
The formula for $\beta$ you are using is only valid if $C$ is invertible (i.e. there is an inverse matrix $C^{-1}$; $C$ needs to have full rank for that), and if the matrix $X^TC^{-1}X$ is invertible as well. A criterion for that is that $X$ needs to have full rank, in this case $2$. $X$ has rank $2$ if not all explanatory values $x_i$ are the same. This formula yields an estimate $\hat \beta = (\hat \beta_0 ; \hat \beta_1)^T$ of $\beta$, and predicted values $$\hat y_i = \hat \beta_0 + \hat \beta_1 x_i.$$ (Note: Only if $C$ would be the identity matrix , the so-called homoskedastic case, these predictions would have the least squared errors $\Sigma_{i=1}^n (y_i - \hat y_i)^2$; for the case of heteroskedasticity, these estimate for $\hat \beta$ is the best linear unbiased estimator, but doesn't necessarily minimize the error squares.)
Since the prediction equation is always linear in $\beta$, this prediction will only coincide if there is a perfectly linear dependence between $y$ and $\beta$. This is in general not the case, so the predictions will not be equal to the actual values. It's usually not desired, nor possible to reconstruct the actual values of $y$ from the coefficient $\beta$ and the values of $x$. However, if we have $\hat \beta$ and some $x$-Value for which we can't measure the $y$, we can use the prediction equation above to get a linear prediction of the value of $y$.