I am trying to understand linear regression. The typical model takes form $$y_{i}=ax_{i} +b + \epsilon_{i}, \ \ \ i=1..N$$ where $\epsilon_{i}$, is an i.i.d Gaussian random variable. The objective is to minimize $$\sum_{i=1}^{N} (y_{i} - ax_{i} – b - \epsilon_{i})^{2}.$$
The computation of the gradient yields to: $$\frac{\partial}{\partial a} = -\sum_{i=1}^{N} y_{i} x_{i} + a\sum_{i=1}^{N} x_{i}^{2} +b \sum_{i=1}^{N} x_{i} + \sum_{i=1}^{N} x_{i}\epsilon_{i}$$ $$\frac{\partial}{\partial b} = -\sum_{i=1}^{N} y_{i} + a\sum_{i=1}^{N} x_{i} +bN + \sum_{i=1}^{N} \epsilon_{i}$$ My question concerns the terms involving $\epsilon_{i}$. What are the arguments that allow us to state that these terms are equal to zero?
Usually we assume $\epsilon\sim N(0,\sigma^2)$, so $E[\epsilon]=0$ and if sample size is large enough $\sum^N\epsilon = NE[\epsilon]=0$