In this paper on page 5, the authors write equation (7):
$$\partial_t \rho_t = 2\xi(t)\nabla_\theta\cdot(\rho_t\nabla_\theta\Psi(\theta;\rho_t))$$
Where:
- $\rho_t$ for $t\in\mathbb R_{\geq 0}$ is a probability distribution on the "parameter space" $\mathbb R^d$.
- $\xi$ is some real-valued function.
- $\theta$ is an element of the parameter space $\mathbb R^d$.
- $\nabla\cdot$ refers to the divergence operator.
- $\Psi$ is a function accepting an element of $\mathbb R^d$ and a probability distribution as inputs and returning a real number.
I need some help first of all just parsing this thing, but also I could do with some references to get a foundation in the theory I need to understand what an equation like this even means. For example, I don't actually understand the expression
$$\rho_t\nabla_\theta\Psi(\theta;\rho_t)$$
I get that the $\nabla$ and everything after it collectively represents a vector field since for any $\theta$ that expression evaluates to a vector in $\mathbb R^d$, but I don't get what the $\rho_t$ is doing there. How do you multiply a vector by a probability measure? Since we then take the divergence, I suppose this whole expression must evaluate to a vector field, but I just don't know how to read it.