Let's say I am doing linear regression and I have a data matrix $A$. And, I know the noise $e_i$, is zero mean (and perhaps we know the distribution, too):
$$y_i = a_i^T x + e_i$$
Obviously from the data, I want to learn the parameters, $x$:
$$y=Ax$$
but, I'm wondering is there any reason that I cannot learn the noise, too, if I know it has a specific form like this?
In other words, I would create a matrix like this:
$$ \hat A=\begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} & 1 & 0 & \vdots \\ a_{21} & a_{22} & \dots & a_{2n} & 0 & 1 & \vdots \\ \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \dots & a_{mn} & 0 & \dots & 1 \\ 0 & 0 & 0 & 0 & 1 & 1 & 1 \\ \end{bmatrix} $$ and a solution vector like this
$$ \hat x \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots\\ w_{1} \\ w_{2} \\ \vdots \\ w_{m} \\ \end{bmatrix} $$ and a known vector like this: $$ \hat y \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots\\ y_{m} \\ 0 \\ \end{bmatrix} $$
The last row is to ensure that the errors sum to be zero.
I am wondering if I will get much greater accuracy if I learn the regression this way? Thanks.
The noise has zero mean.
It doesn't mean any realization of it has zero sum (What you try to impose).
Actually if you work the problem as a Maximum Likelihood problem you could see it works according to the noise properties.