I am learning the smoothing spline method. I saw that smoothing spline is a penalty term to reduce overfitting in linear regression. Given dataset {$(x_1,y_1),(x_2,y_2)..(x_n,y_n)$}So the formular such as: $$RSS=\sum(y_i-f(x_i))^2+\lambda\int((f(t)'')^2dt$$
Assume it is linear case so $$f(x_i)=ax_i+b$$ $$f(x_i)''=0$$
Is it correct. Could you explaint help me how to find second term in RSS ($\lambda\int((f(t)'')^2dt$) in linear regression case? Or give me one example?Thank you so much
The purpose of the smoothing term (the second term, the integral) is to reduce "wiggles" in the approximating function $f$. Of course, reducing wiggles often means that the function $f$ does not fit to your data as well as it might. That's the whole point -- you are making some compromise between wiggliness and closeness of fit.
This approach is typically used when fitting with spline curves or polynomials, which have a tendency to wiggle (or, at least, the ability to wiggle).
If you are using a linear function $f$ to do the fitting, then the smoothing technique does not apply. If $f$ is linear, then $f''=0$ and so the smoothing term is zero. So you are left minimizing the first term, $\sum(y_i-f(x_i))^2$, which means you are just doing standard least-squares fitting, with no smoothing.
Saying it another way -- a linear function $f$ is perfectly free from wiggles, no matter what, so it doesn't make sense to do a closeness/wiggliness trade-off.