Bishop - Pattern Recognition & Machine Learning, Exercise 1.4

703 Views Asked by At

I'm working on exercise 1.4 in Bishop's Pattern Recognition & Machine Learning book.

This exercise is about probability densities. I've two questions about this exercise.

First, I don't understand equation 1.27. He writes: "Under a nonlinear change of variable , a probability density transforms differently from a simple function, due to the Jacobian Factor."

I never ever heard about the Jacobian factor. What is that factor?

"For instance, if we consider a change of variables $x = g(y)$, then a function $f(x)$ becomes $\tilde f(g(y))$. Now consider a probabilty density $p_x(x)$ that corresponds to a density $p_y(y)$ with respect to the new variable $y$, where the suffices denote the fact that $p_x(x)$ and $p_y(y)$ are different densities. Observations falling in range $(x, x + \delta x)$ will, for small values of $\delta x$, be transformed into the range $(y, \delta y)$ where $p_x(x)\delta x \simeq p_y(y)\delta y$, [...]"

What does the relation $\simeq$ mean in this context?

"[...] and hence $$ \begin{align} p_y(y) &= p_x(x) \left| \frac{\text{d}x}{\text{d}y}\right|\\ &= p_x(g(y))\left|g'(y)\right|." \end{align} $$

This is equation 1.27. I don't understand where this equation comes from. Why is there this absolute value?

"One consquence of this property is that the concept of the maximum of a probabilty density is dependent on the choice of variable."

And at this point the book refers to exercise 1.4:

"Consider a probability density $p_x(x)$ defined over a continous variable $x$, and suppose that we make a nonlinear change of variable using $x = g(y)$, so that the density transforms according (1.27). By differentiating (1.27), show that the location $\hat y$ of the maximum of the density in $y$ is not in general related to the location $\hat x$ of the maximum of the density over $x$ by the simple functional relation $\hat x = g(\hat y)$ as a consequence of the Jacobian factor. This shows that the maximum of a probability density (in contrast to a simple function) is dependent on the choice of variable. Verify that, in the case of a linear transformation the location of the maximum transforms in the same way as the variable itself."

I don't understand, what this exercise asks me to do... :/

Would be great, if someone could help me...

2

There are 2 best solutions below

1
On

Equation 1.27 is called the change of variables theorem. I will try to explain it briefly here considering the same variables that the book mentions. $y \sim p_y$ and $x \sim p_x$ are the two random variables whose transformation is given by a function $g$, i.e.

$$ \begin{equation} x = g(y) \end{equation} $$

The question here is how does applying a transformation to $y$ affects the resultant density of $x \sim p_x$?

we know that probability densities integrate to 1. $$ \begin{aligned} \int p_y(y) dy =& \int p_x(x) dx = 1\\ p_y(y) =& p_x(x) |\frac{dx}{dy}|\\ p_y(y) =& p_x(g(y)) |\frac{dg(y)}{dy}|\\ p_y(y) =& p_x(g(y)) |g^{\prime}(y)| \end{aligned} $$

where the term $|g^{\prime}(y)|$ is called the Jacobian determinant of the function $g$ wrt $y$. || does not denote absolute value, but the determinant. More intuitively, this determinant is telling us about the infinestimal change in volume in $p_y(y)$ that your function $g$ causes. The video will give you a better idea about this.

0
On

The reason for the term involving $|g(y)|$ originates from the change of variables from $x$ to $y$. When we change from $p_X(x)$ to $p_Y(y)$, we have to ensure that the integral over the probability distribution is still equal to one.

Supposing the change of variables is monotonically increasing, let $F_X(x)$ be the cumulative distribution function of $x$, then the cumulative distribution function $$ F_Y(y)=P(Y\leq y)=P(X\leq g(y))=F_X(g(y)) $$ and the probability distribution function of Y is $$ p_Y(y)={d\over dy}F_Y'(y)=p_X(g(y)){d g(y)\over dy}=p_X(g(y))\left|g'(y)\right| $$ Since $g'(y)>0$ the absolute value is unnecessary, but in the case that the transformation is monotonically decreasing, we actually get $-g'(y)$ term, which again is $|g'(y)|$, so this absolute value sign covers both cases.

There is more information in statistics texts such as this.

I don't understand, what this exercise asks me to do...

What the exercise is asking you to do is to differentiate equation (1.27) with respect to $y$ to find the maximum value, where this derivative is equal to zero, and then show that this is not necessarily in the same place as the maximum value of $x$. It's a straightforward application of the product rule for differentiation.