How to evaluate the MSE of linear regression with noise in every dimension

139 Views Asked by At

I am considering a linear regression problem but with some slight different from that usually discussed. Suppose there are $m$ distinct points $\boldsymbol{x}_i\in R^n, i = 1, \cdots, m$, lying on the same hyperplane $\{\boldsymbol{y}\in R^{n}: \boldsymbol{p}^{T}\boldsymbol{y} = \alpha\}$, and thus $\boldsymbol{X} = [x_1, \cdots, x_m]\in R^{n\times m}$ satisfies $\boldsymbol{X}^{T}\boldsymbol{p} = \alpha\boldsymbol{1}_{m}$. Clearly, as long as $m\ge n$, I can solve $\boldsymbol{p}$ and $\alpha$ with least square estimation.

Now I can get access to a noised version of $\boldsymbol{X}$, i.e. $\boldsymbol{Y} = \boldsymbol{X} + \boldsymbol{Z}$ is observed, with $z_{ij}\sim\mathcal{N}(\boldsymbol{0}, \sigma^2)$ independent to each other. Assuming $\alpha = 1$, the least square estimation can be formulated as $$\min_{\boldsymbol{p}}\|\boldsymbol{Y}^{T}\boldsymbol{p} - \boldsymbol{1}_m\|_2^2$$ Then $\hat{\boldsymbol{p}} = (\boldsymbol{Y}\boldsymbol{Y}^{T})^{-1}\boldsymbol{Y}\boldsymbol{1}_m$. Obviously, the MSE of this estimation is given by taking the expectation of $\boldsymbol{Z}$: $$MSE = \mathbb{E}\left[\|\boldsymbol{Y}^{T}(\boldsymbol{Y}\boldsymbol{Y}^{T})^{-1}\boldsymbol{Y}\boldsymbol{1}_m - \boldsymbol{1}_m\|_2^2\right]$$ So here comes the problem: $\boldsymbol{Y}$ is a random variable completely dependent on $\boldsymbol{Z}$, so how can I get rid of it with an inverse $(\boldsymbol{Y}\boldsymbol{Y}^{T})^{-1}$?

I have read through quite some discussions on linear regression and it seems that such situation is hardly considered. Normally, the linear regression model gives $\boldsymbol{y} = \boldsymbol{W}\boldsymbol{\beta} + \boldsymbol{z}$ with $\hat{\boldsymbol{\beta}} = (\boldsymbol{W}^{T}\boldsymbol{W})^{-1}\boldsymbol{W}^{T}\boldsymbol{y}$, and the analysis of MSE goes quite naturally as $\boldsymbol{W}$ is fixed and the only randomness comes from $\boldsymbol{y}$. Then the MSE can be figured out without evaluating the inverse matrix. Examples of such model can be found in MSE of Ridge estimator, Linear Regression. or Optimality of the MSE in gaussian linear regression. Actually, such model can be formulated as $$ \left[\boldsymbol{W}, \boldsymbol{y}\right]\left[ \begin{array}{c} \boldsymbol{\beta} \\ -1 \end{array} \right] = \boldsymbol{z} $$ So the difference between this model and my problem is the noise term. In the normally discussed model, the noise is only added to the output value $\boldsymbol{y}$. But in my problem, the noise exists for every dimension of the observed points, which yields a noised inverse matrix that I do not know how to handle.

I am wondering whether such problem has been discussed somewhere or any insights about the noised inverse matrix. Alternatively, any suggestions on solving the MSE are also appreciated.