If X and Z are uncorrelated and Z is normal with mean zero and constant variance, why can I assume Z is zero?

40 Views Asked by At

I have a data set that I have used to calculate the coefficients for a linear regression. The data set is of the form $\lbrace x_i,y_i\rbrace_{i=1}^{n} $

Let $$Y = \alpha + \beta X + Z$$ where $\text{corr}(X,Z) = 0$ and $Z \sim N(0,\sigma_Z^2)$, with constant $\sigma_Z^2$

To calculate $\alpha$ and $\beta$, I had to assume $Z$ is zero. I then could find them by

$$\beta = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2}$$ and assuming $Z=0$, then $$\alpha = \bar{y}-\beta \bar{x}$$I am fairly certain this assumption is correct since the numbers I got match the rest of the problem. However, I don't understand why I can assume this.

Why can I assume $Z=0$?

1

There are 1 best solutions below

0
On

During typical linear regression, our goal is to chose $\alpha,\beta$ to minimize the objective function $$J(\alpha,\beta) = \sum_{i=1}^n (y_i - (\alpha + \beta x_i))^2$$ (the ordinary least square). This corresponds to the equation $y = \beta x + \alpha$ which will best fit the data set.

An affine transformation of $Z$, means we should replace $y_i$ by $y_i+Z$. Plugging this into the above function and expanding yields $$(y_i + Z- (\alpha + \beta x_i))^2 = (y_i - (\alpha + \beta x_i))^2 + Z^2 + 2 Z(y_i - (\alpha + \beta x_i))$$

Now $Z^2$ doesn't depend on $\alpha$ and $\beta$ and hence will fall away when minimizing. And $2 Z(y_i - (\alpha + \beta x_i))$ is (assuming mean value) $0$ since $Z$ and $(y_i - (\alpha + \beta x_i))$ are uncorrelated. Hence when you are minimizing $$\min_{\alpha, \beta} \sum_{i=1}^n (y_i + Z- (\alpha + \beta x_i))^2$$ it is equivalent to $$\min_{\alpha, \beta} \sum_{i=1}^n (y_i - (\alpha + \beta x_i))^2$$ and therefore $\alpha,\beta$ are chosen the same with or without $Z$ (assuming whenever $Z$ is uncorrelated with $y_i - (\alpha + \beta x_i)$ and zero mean)