Linear/non-linear change of variables: $\tilde{f} \ ' (\tilde{y}) = f'(g(\tilde{y})) g'(\tilde{y}) = 0$ and assuming $g'(\tilde{y}) \not= 0$

329 Views Asked by At

I am currently studying the textbook Pattern Recognition and Machine Learning by Christopher Bishop.

The problem statement for exercise 1.4 of the textbook is as follows:

Consider a probability density $p_x(x)$ defined over a continuous variable $x$, and suppose that we make a nonlinear change of variable using $x = g(y)$, so that the density transforms according to (1.27). By differentiating (1.27), show that the location $\tilde{y}$ of the maximum of the density in $y$ is not in general related to the location $\tilde{x}$ of the maximum of the density over $x$ by the simple functional relation $\tilde{x} = g(\tilde{y})$ as a consequence of the Jacobian factor. This shows that the maximum of a probability density (in contrast to a simple function) is dependent on the choice of variable. Verify that, in the case of a linear transformation, the location of the maximum transforms in the same way as the variable itself.

Equation 1.27 referenced above is

$$\begin{align} p_y(y) &= p_x(x) \left| \dfrac{dx}{dy} \right| \\ &= p_x(g(y)) |g'(y)| \tag{1.27} \end{align}$$

The solution from the solutions manual begins as follows:

We are often interested in finding the most probable value for some quantity. In the case of probability distributions over discrete variables this poses little problem. However, for continuous variables there is a subtlety arising from the nature of probability densities and the way they transform under non-linear changes of variable.

Consider first the way a function $f(x)$ behaves when we change to a new variable $y$ where the two variables are related by $x = g(y)$. This defines a new function of $y$ given by

$$\tilde{f}(y) = f(g(y)) \tag{2}$$

Suppose $f(x)$ has a mode (i.e. a maximum) at $\hat{x}$ so that $f'(\hat{x}) = 0$. The corresponding mode of $\tilde{f}(y)$ will occur for a value $\hat{y}$ obtained by differentiating both sides of (2) with respect to $y$

$$\tilde{f} \ ' (\tilde{y}) = f'(g(\tilde{y})) g'(\tilde{y}) = 0 \tag{3}$$

Assuming $g'(\tilde{y}) \not= 0$ at the mode, then $f'(g(\tilde{y})) = 0$. However, we know that $f'(\hat{x}) = 0$, and so we see that the locations of the mode expressed in terms of each of the variables $x$ and $y$ are related by $\tilde{x} = g(\tilde{y})$, as one would expect. Thus, finding a mode with respect to the variable $x$ is completely equivalent to first transforming to the variable $y$, then finding a mode with respect to $y$, and then transforming back to $x$.

...

One of the main points that I am wondering about here is,

why must we assume that $g'(\tilde{y}) \not= 0$? why can we not instead assume that $f'(g(\tilde{y})) \not= 0$?

Using the notation from the textbook, let's use an example to illustrate. Let $g(\tilde{y}) = \tilde{y}^2 + 2\tilde{y} - 3$ and $f(x) = 4 - x^2$. Then we have that $f(g(\tilde{y})) = 4 - (\tilde{y}^2 + 2\tilde{y} - 3)^2$. Therefore,

$$\begin{align} &f'(g(\tilde{y})) = \dfrac{df}{d \tilde{y}} = \dfrac{df}{dg} \dfrac{dg}{d\tilde{y}} = -2(\tilde{y}^2 + 2\tilde{y} - 3)(2\tilde{y} + 2) = -4(\tilde{y}^2 + 2\tilde{y} - 3)(\tilde{y} + 1) = 0 \\ &\Rightarrow -4(\tilde{y} + 3)(\tilde{y} - 1)(\tilde{y} + 1) = 0 \\ &\therefore \tilde{y} = -3, 1, -1 \end{align}$$

Substituting these roots into $g'(\tilde{y}) = 2\tilde{y} + 2 = 2(\tilde{y} + 1)$, we can now calculate that $g'(-1) = 0$, $g'(1) = 4$, and $g'(-3) = -4$. This shows that the modes of the equation $f(g(\tilde{y})) = 4 - (\tilde{y}^2 + 2\tilde{y} - 3)^2$ do not necessarily correspond to the modes of the equation $g(\tilde{y}) = \tilde{y}^2 + 2\tilde{y} - 3$, since the roots of $f'(g(\tilde{y})) = -4(\tilde{y} + 3)(\tilde{y} - 1)(\tilde{y} + 1)$ are not necessarily the roots of $g'(\tilde{y}) = 2\tilde{y} + 2 = 2(\tilde{y} + 1)$.

So, given all of this, I think I might have found the answer to my question as follows:

If we assume that $f'(g(\tilde{y})) \not= 0$, then we assume that $\tilde{y}$ cannot be any of the roots of $f'(g(\tilde{y}))$, because otherwise we would, obviously, have that $f'(g(\tilde{y})) = 0$. Using the example to illustrate, we assume that $\tilde{y} \not= -1, 1, -3$. But $-1$ is a (the?) root of $g'(\tilde{y})$, and so, assuming it is the only root of $g'(\tilde{y})$, this means that we cannot have that $g'(\tilde{y}) = 0$, which means that $\tilde{f} \ ' (\tilde{y}) = f'(g(\tilde{y})) g'(\tilde{y}) \not= 0$, contrary to what we require. The only question is, is it the case that the roots of $g'(\tilde{y})$ are always also roots of $f'(g(\tilde{y}))$? Because, if this is true, then it is also true that $\tilde{f} \ ' (\tilde{y}) = f'(g(\tilde{y})) g'(\tilde{y}) = 0$ implies (that is, forces us to assume) that $g'(\tilde{y}) \not= 0$ and $f'(g(\tilde{y})) = 0$. If, however, this is not true, then my idea falls apart.

If my idea/reasoning is incorrect, then we are led back my original question:

Why must we assume that $g'(\tilde{y}) \not= 0$? Why can we not instead assume that $f'(g(\tilde{y})) \not= 0$?

I would greatly appreciate it if people would please take the time to review my reasoning.

1

There are 1 best solutions below

5
On BEST ANSWER

Hint: I think the author just wants to give the reader the advice to be carefully with non-linear substitutions when looking for the maximum of a function. This is due to the fact that we have to cope with the chain-rule when calculating the derivative of composite functions.

Looking at (3) again we have to consider \begin{align*} f^{\prime}(\tilde{y}) = f^{\prime}(g(\tilde{y})) g^{\prime}(\tilde{y}) = 0 \tag{3} \end{align*}

The author continues rather pragmatically. When solving (3) we consider a substitution as convenient when there are zeros of (3) for which $g^{\prime}(\tilde{y}) \ne 0$.

In this case we can easily continue and conclude that from $f^\prime(g(\tilde{y})) = 0$ we have a solution $\tilde{x}=g(\tilde{y})$.

The phrase Assuming $g^{\prime}(\tilde{y}) \ne 0$ ... is just a formulation addressing the search for a convenient substitution.