I'm working on exercise 1.4 in Bishop's Pattern Recognition & Machine Learning book.
This exercise is about probability densities. I've two questions about this exercise.
First, I don't understand equation 1.27. He writes: "Under a nonlinear change of variable , a probability density transforms differently from a simple function, due to the Jacobian Factor."
I never ever heard about the Jacobian factor. What is that factor?
"For instance, if we consider a change of variables $x = g(y)$, then a function $f(x)$ becomes $\tilde f(g(y))$. Now consider a probabilty density $p_x(x)$ that corresponds to a density $p_y(y)$ with respect to the new variable $y$, where the suffices denote the fact that $p_x(x)$ and $p_y(y)$ are different densities. Observations falling in range $(x, x + \delta x)$ will, for small values of $\delta x$, be transformed into the range $(y, \delta y)$ where $p_x(x)\delta x \simeq p_y(y)\delta y$, [...]"
What does the relation $\simeq$ mean in this context?
"[...] and hence $$ \begin{align} p_y(y) &= p_x(x) \left| \frac{\text{d}x}{\text{d}y}\right|\\ &= p_x(g(y))\left|g'(y)\right|." \end{align} $$
This is equation 1.27. I don't understand where this equation comes from. Why is there this absolute value?
"One consquence of this property is that the concept of the maximum of a probabilty density is dependent on the choice of variable."
And at this point the book refers to exercise 1.4:
"Consider a probability density $p_x(x)$ defined over a continous variable $x$, and suppose that we make a nonlinear change of variable using $x = g(y)$, so that the density transforms according (1.27). By differentiating (1.27), show that the location $\hat y$ of the maximum of the density in $y$ is not in general related to the location $\hat x$ of the maximum of the density over $x$ by the simple functional relation $\hat x = g(\hat y)$ as a consequence of the Jacobian factor. This shows that the maximum of a probability density (in contrast to a simple function) is dependent on the choice of variable. Verify that, in the case of a linear transformation the location of the maximum transforms in the same way as the variable itself."
I don't understand, what this exercise asks me to do... :/
Would be great, if someone could help me...
Equation 1.27 is called the change of variables theorem. I will try to explain it briefly here considering the same variables that the book mentions. $y \sim p_y$ and $x \sim p_x$ are the two random variables whose transformation is given by a function $g$, i.e.
$$ \begin{equation} x = g(y) \end{equation} $$
The question here is how does applying a transformation to $y$ affects the resultant density of $x \sim p_x$?
we know that probability densities integrate to 1. $$ \begin{aligned} \int p_y(y) dy =& \int p_x(x) dx = 1\\ p_y(y) =& p_x(x) |\frac{dx}{dy}|\\ p_y(y) =& p_x(g(y)) |\frac{dg(y)}{dy}|\\ p_y(y) =& p_x(g(y)) |g^{\prime}(y)| \end{aligned} $$
where the term $|g^{\prime}(y)|$ is called the Jacobian determinant of the function $g$ wrt $y$. || does not denote absolute value, but the determinant. More intuitively, this determinant is telling us about the infinestimal change in volume in $p_y(y)$ that your function $g$ causes. The video will give you a better idea about this.