Understanding regularization in Least squares

180 Views Asked by At

I have a quesiton form Boyd and Vanderberghe' convex optimization book. The picture is shown below.

If you see the first term in the equation i.e. sum of squared differences, I can see that regardless of $a_{i}^Tx $ term being a positive or a negative number the squared term $(a_{i}^Tx - b_{i})^2$ will always be positive. And since we are trying to minimize $(a_{i}^Tx - b_{i})^2$ the solution x will have to be the smallest vector right, in terms of its magnitude? If x blows up the whole first term will blow up and the algorithm is trying to avoid this right?

Then why do we need regularization?

Thanks.

enter image description here

1

There are 1 best solutions below

2
On BEST ANSWER

Minimizing only the first term (standard linear regression) does not control the length of $x$. $x$ could still be very large, as long as its dot product with $a$'s is small. Say, any $x$ perpendicular to all of the $(a_i)$ would make the first part of the sum equal zero. You really need the second term, which in Machine Learning, for example, controls the complexity of your linear model.