What does it mean to integrate a parameter vector?

535 Views Asked by At

A naive question: I know a bout integrating a scalar function over values of x $$\int f(x)dx $$ and I'm trying to learn Machine learning now, however I face integrals integrating parameter vector w over dw $$\int f(W)dW $$ so what does this integral mean? For example, what is the riemann sum that this integral is calculating? Is this the same as line integral?

An example below enter image description here

1

There are 1 best solutions below

0
On

It's usually just a regular multidimensional integral. Just think of parameter space as you would any other space. (It won't be a more complicated line or surface integral unless the parameters form some kind of manifold of positive co-dimension because you have constrain relations between variables).

For instance, integrating over the parameters of a linear regressor $ g_{\alpha,\beta}(x)= \alpha x + \beta $ in 1D: $$f(x) = \iint J(\alpha x + \beta) d\alpha\,d\beta $$


For Bayesian models, it can be a bit confusing, so let me give an example using them. Suppose we have a 1D regressor model $\widehat{y} = f_\theta(x)$ with parameters $\theta$. Given a dataset $ D=\{(x_i,y_i)\}_i$, we usually have a likelihood like $$ p(y_i|x_i,\theta)=\mathcal{N}(y_i|f_\theta(x_i),\sigma^2_\ell) $$ as well as a prior over the weights $$ p(\theta) = \mathcal{N}(\theta|0,\sigma^2_pI) = \prod_d \mathcal{N}(\theta_d|0,\sigma^2_p) $$ Now, we need our model to give us a prediction on some new input ${x_\text{new}}$. We need the predictive distribution: \begin{align} p(y_\text{new}|x_\text{new},D) &= \int p(y_\text{new}|x_\text{new},D,\theta) p(\theta|x_\text{new},D) d\theta \\ &= \int p(y_\text{new}|x_\text{new},\theta) p(\theta|D) d\theta\end{align} The first term is the likelihood, which is not a big deal, but the second is the posterior over the parameters: $$ p(\theta|D)=\frac{p(D|\theta)p(\theta)}{p(D)} $$ which has the prior, the likelihood over the training set $$ p(D|\theta) = \prod_i p(y_i|x_i,\theta) $$ and the model evidence (marginal likelihood) $$ p(D) = \int p(D|\theta) p(\theta)d\theta $$ which can (sometimes) be ignored, since it does not depend on the parameters.

Check out this link too.