Bayes interpretation of regularization in linear regression

137 Views Asked by At

I am deriving L2 regularization by considering Bayes theorem. In doing so I came across the following article which stated that the probability of a parameter theta has a probability distribution that is normal with mean 0. I would like to ask why such an assumption is made when a uniform distribution seems more natural?

enter image description here

1

There are 1 best solutions below

6
On BEST ANSWER

One can say that the uniform distribution seems more natural as a prior distribution since it may seem better to assume that all possible values of a parameter are equally probable. However, that also means that you don't actually have any "prior" information about the parameter you want to estimate because "not having any information about something" actually means "all outcomes are equally probable". Moreover, if you use a uniform prior, the term $\log P(\theta)$ becomes constant and thus doesn't have any effect in the optimization step so you will have to get rid of it. Therefore, in this case, the maximum a posteriori estimate becomes equal to the MLE! and nothing new is actually done.

We use a distribution other than the uniform as a prior when we have some information about the parameters or we simply want to "impose" some distribution on them. In the case of L2 regression, we want our parameters to be "small" so we choose a normal distribution centered at zero which makes the values that are close to zero the most probable (in a normal distribution the mean is the most probable value), and thus we obtain smaller values for the parameters after estimation.