First, Let me quote a paragraph from "Pattern Recognition and Machine Learning"-Bishop:
Under a nonlinear change of variable, a probability density transforms differently from a simple function, due to the Jacobian factor. For instance, if we consider a change of variables x = g(y), then a function f(x) becomes $\tilde{f}(y) = f(g(y))$. Now consider a probability density $p_x(x)$ that corresponds to a density $p_y(y)$ with respect to the new variable y, where the suffices denote the fact that $p_x(x)$ and $p_y(y)$ are different densities. Observations falling in the range (x, x + δx) will, for small values of δx, be transformed into the range (y, y + δy) where , and hence $p_y(y) = p_x(g(y)) |g'(y)|$ .
I have two questions here:
I can clearly understand the equation ( $p_x(x)δx \simeq p_y(y)δy$ ) for the case when x=g(y) is one to one function. Is there any explaination why this would also hold for a one to many function ($g(x)$)? For example, consider $x = g(y) = sin(y/10000000)$. Which means, $Y \epsilon (0,0+\delta{y})$ maps to $X \epsilon (0,0+\delta{x})$. Isn't here $p_x(x)δx \gt p_y(y)δy $
And can comeone explain how does this $|g'(x)|$ come from in the last equation?
PS: The second question has already been answered here, but I still didnt understand how does that negative sign come from chain rule, in montonically decreasing case (in the answer).