Concentration of Least Squares Estimate

35 Views Asked by At

We know that if the unknown model satisfies a linearity assumption i.e. $y = x^T\theta + \epsilon$, where $\epsilon$ is Gaussian Noise, we have the least squares estimate $\hat{\theta}$, to be a good enough approximate of $\theta$ as number of data points increases.

Let us suppose that we know that the model is not linear, i.e there is a residual error $\Delta_i$, i.e. the data satisfies $$y_i = x_i^T\theta + \Delta_i + \epsilon$$. Can we still say that $\hat{\theta}$, the least squares estimate is a good approximation for $\theta$

By good approximation I mean that $||\theta - \hat{\theta}||$ in some norm goes to zero.