I again found myself in trouble understanding the following problem (1b)

I understand linear least squares as that I have a data points, and I am trying to find a line between them which approaches given data the best and thus I want to minimaze the distance between the model and the real system. I know the formula, I know a way but I am quite clueless how to apply that.
My idea is $e[k]=y-\hat{y}$ and thus substituting that to the formula I listed above, and then using the system formula rewrite e as a function of theta and then do the partial derivation with respect to theta and so obtain a minimum.
Can anyone please give me a hint if I am thinking correctly and if so how to get rid of that norm ? Thanks a lot
Note that $$\hat{y}[k] = \theta u[k].$$ Consequently, $$\|y-\hat{y}\|^2 = \sum_{k=0}^{N-1}(y[k]-\theta u[k])^2.$$ In order to compute $\theta$, we take the derivative of the above expression w.r.t. $\theta$ and equate to $0$. That is, $$\frac{\partial }{\partial \theta}\sum_{k=0}^{N-1}(y[k]-\theta u[k])^2 = -2\sum_{k=0}^{N-1}(y[k]-\theta u[k])u[k] = 0 \implies \theta \sum_{k=0}^{N-1}(u[k])^2=\sum_{k=0}^{N-1}y[k]u[k].$$ Therefore, $$\hat{\theta} = \frac{\sum_{k=0}^{N-1}y[k]u[k]}{\sum_{k=0}^{N-1}(u[k])^2}.$$
Obviously, $\theta$ cannot be estimated when $u[k]=0$.
Lastly, use the fact that $E[y[k]]=E[\theta u[k] + e[k]] = \theta u[k] + 1$ to calculate the bias of $\hat{\theta}$.