I am recently learning some papers on optimization in infinite-dimensional space, and I not familiar with function analysis. In some papers, I see $\|f\|_2^2$ is written as $\int |f(x)|^2 p(x)dx$ , where $p(x)$ may be the probability density function of $x$. Could anyone tell me why $\|f\|_2^2$ can be written as $\int |f(x)|^2 p(x)dx$ ?
By the way, could someone recommend some introduction documents so that I can quickly know some important definitions and calculation rules about the basic function analysis. In machine learning and optimization, I sometimes will consider some questions on kernel function which is built on infinite-dimensional space (e.g.,Hilbert space).
The underlying idea is to have a space of functions instead of a space of points.
What do you need to make this work? It turns out that the most important properties (ie topological properties) all follows from the notion of distance.
It is intuitive what is the distance between two points. But what is the distance between two functions? If you define that, you can treat functions as points because the definition of limits, closure of sets and other related concepts all have their foundation on the notion of distance.
So, let's first define a norm on the space of our function $V$; this is a function $||\cdot|| : V \to \mathbb R$ with the following properties:
$$||f|| = 0 \iff f = 0$$ $$||f|| \ge 0$$ $$|| f + g || \le ||f|| + ||g||$$
The last property is known as triangular inequality. (You may verify that the notion of norm of a vector in $\mathbb R^n$ respect those properties)
Once you have defined a norm, you get (for free!) a notion of distance; simply set the distance between two functions $f, g$ as $$d(f, g) = ||f - g||$$
One may define a lot of spaces and a lot of norms on them. One of the choices bring us to consider $L^p(\Omega)$, the space of all functions so that $\int_\Omega |f|^p dq(x)$ is finite. (note that the integral is done with respect to a measure $q(x)$. This is not the usual riemann integral although it has the same symbol, is defined in a different manner)
On these $L^p$ spaces it is useful to define the norm as $$||f||_{L^p} = \left(\int_\Omega |f(x)|^p dq(x) \right) ^ \frac 1p$$
One should check that this is indeed a norm, but it is so (the triangle inequality here is the Minkowski inequality) A common way to indicate $||f||_{L^p}$ is simply $||f||_p$.
In the setting of random variable and probability measures, if you know that your random variable $f$ with probability distribution $q(x)$ admits a density $g(x)$, then
$$||f||_{L^p} = \left(\int_\Omega |f(x)|^p dq(x) \right) ^ \frac 1p = \left(\int_\Omega |f(x)|^p g(x) dx \right) ^ \frac 1p$$
Here the last integral is the Lebsegue integral (that is, the integral defined with respect to the Lebesgue measure) which can be computed with the usual rules.