How does the solution for polynomial regression depend on the number of data points and the error distribution

215 Views Asked by At

Let the set of instances be generated by the function , where $ε$ is random, uniformly distributed noise. Also, assume that you are using a fifth degree polynomial for regression.

1

There are 1 best solutions below

1
On BEST ANSWER

Since we are using a fifth degree polynomial, which bends closely to the shape of the points, if there is a lot of noise then we could get overfitting. That is, the optimal hypothesis will have parameters $w$ not reflective of the actual 2nd degree polynomial, and would include higher order terms such as cubics, quartile, and quintics. If there are many data points, the estimated parameters $w$ would generally be closer to the true quadratic coefficients, hopefully getting small values for the third degree and higher terms. This is especially true if there is not much noise. However, if there were a lot of noise, having more instances would tend to mitigate the effects. In the presence of much noise, we expect $w$ to converge to the true parameter estimates as the number of instances increases, though more slowly for higher noise. That is,

$$\lim_{n\rightarrow\infty} w_n=(0,0,0,4,-4,-2)$$