I'm at a loss trying to implement Bayesian model selection for standard least-squares polynomials fits.
I have three polynomials of order $1$, $2$, and $3$, and a sequence of $(x,y)$ data points. After performing least-squares fitting of the three polynomials, I want to select the one with the maximum evidence (marginal likelihood).
Given a model $H_{i\in\{1,2,3\}}$ with parameters $w_i$, a set of inputs $x$ and outputs $y$, I need to compute the evidence $$ p(y\mid x ,H_i) = \int p(y\mid x,w,H_i)p(w\mid H_i) \, \mathrm{d}w $$ For the cubic polynomial $H_3$, w is a 4-element vector $w=[a,b,c,d]$, and $$ H_3 = a + bx + cx^2 + dx^3 $$
So far, so good.
I want my prior $p(w\mid H_i)$ to be as flat as possible, so my plan is to use an improper prior $p(w\mid H_i)=1$. Or do I have to use something more proper, like a Gaussian with a crazy covariance? Because otherwise, the integral would reduce to just the integral of the Gaussian PDF $p(y\mid x,w,H_i)=1$ for all my three $H_i$, wouldn't it?
How do people normally do this thing?
Or perhaps I'm just confused about the form of the integral. $\int p(y\mid x,w,H) \, dy=1$, of course, since it is a probability distribution, but the integral I want is $\int p(y\mid x,w,H)\,dw$.
So my question is: how do I compute that integral?