On integration over distribution.

430 Views Asked by At

When I read various journal articles related to machine learning, I often face integrals over distribution.

In an article I am reading now, for example, a risk function associated with distribution $\phi$ is defined by $$ R_i(\theta) = \int f_L(f_\theta(x),y) \, d\phi(x,y),$$ where

  • $\mathcal{X}$ and $\mathcal{Y}$ are a feature space and a label space, respectively.
  • $f_\theta:\mathcal{X}\to\mathcal{Y}$ is a given model parameterized by $\theta\in\Theta$,
  • $f_L:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}_{\ge0}$ is a loss function, and
  • $\phi$ is the data generating distributions.

In addition to the above case and the others (even not related to this field), I have seen many times integration formulas over distributions. However, whenever I encountered them, I couldn't grasp what it is.

Rather, I am familiar with the following equation: $$ \int f_L(x) p_X(x) \, dx, $$ where $f_L(x)$ is a cost (or reward) function achieved by an event $x$, and $p_X(x)$ is a probability that an event $x$ occurred.

Can someone please let me know what the integral over a distribution means?

1

There are 1 best solutions below

3
On BEST ANSWER

I think you lack relevant knowledge on Riemann–Stieltjes integral

In the Probability theory,the Expectation of Discrete distribution and continuous distribution can be written uniformly as Riemann–Stieltjes integral form, that is $$ E_X(f(x)) = \int f(x)dF(x)$$

where $F(x)$ is the cumulative distribution function of random variable $X$, you can roughly think that $dF(x) = p(x)dx$

I hope it is useful to you. Sorry, if it didn't help you.