approximating probability density in a shell for a high dimensional gaussian distribution: Bishops pattern recognition

370 Views Asked by At

This is my first post here, I'm hoping this isn't one of those things that'll be obvious in hindsight. Here goes:

In Bishop's pattern recognition, problem 1.20, there is the following problem:

Consider a Guassian distribution in D dimensions given by: $$ p(\boldsymbol{x}) = \frac{1}{(2\pi\sigma^2)^{D/2}}exp(-\frac{\lvert \lvert \boldsymbol{x} \rvert \rvert ^2}{2\sigma^2}) $$

You can get the probability density with respect to the radius by switching to polar coordinates and integrating out the directional variables to get the following:

$$ p(r) = \frac{S_Dr^{D-1}}{(2\pi\sigma^2)^{D/2}}exp(-\frac{r^2}{2\sigma^2}) $$

where S_D is the surface area of the sphere of dimension D.

Then the problem has you show that for large D, p(r) has a single stationary point located at:

$$ \hat r \simeq \sqrt{D}\sigma $$

I'm with it so far, here's where I'm having trouble: by considering: $$ p(\hat r + \epsilon) $$

where $$ \epsilon \ll \hat r $$

show that for large D: $$ p(\hat r + \epsilon) = p(\hat r)exp(-\frac{\epsilon^2}{\sigma^2})$$

embarrassingly enough, this problem is even in the solution manual and I'm still struggling. Here's what's said:

We note that:

$$p(\hat r + \epsilon) \propto (\hat r + \epsilon)^{D - 1}exp[-\frac{(\hat r + \epsilon)^2}{2\sigma^2}]$$ $$ = exp[-\frac{(\hat r + \epsilon)^2}{2\sigma^2} + (D - 1)ln(\hat r + \epsilon)]$$

I'm clear so far, here's where I'm getting stuck:

we now expand p(r) around r hat. Since this is a stationary point, we must keep terms up to the second order. Making use of the expansion:

$$ ln(1 + x) = x - x^2/2 + O(x^3)$$

together with $$ D \gg 1$$

we obtain the desired equation: $$ p(\hat r + \epsilon) \approx p(\hat r)exp(-\frac{\epsilon^2}{\sigma^2})$$

So. I know that I'm supposed to do a Taylor expansion around r hat, but after trying to do the expansion directly, I got lost in the algebra... I either made a mistake or there's some nuance I'm missing. Is it really that simple and I'm just making a mechanical mistake?

More generally, I covered Taylor expansion in my calc 3 class, but I don't feel like I have a good grounding for it. I've seen elsewhere for example the Lagrangian form of the remainder from the Taylor expansion, but it was just presented as hand waving. I'd like to learn more about the theory so I know what I'm working with, and so that I don't get stuck on problems like this in the future. I'm going to start real/complex analysis soon, is that something that I'll be getting a better sense of in those fields?

edit: the 'hint' was a little confusing too, since I don't know the rules around taking the Taylor expansion for a function composed of other functions, like f(g(r)). How can I use that hint about the expansion for the ln function, since it's in an exponential?

Sorry again if this is an obvious question, but I really appreciate any help that anyone can offer.

1

There are 1 best solutions below

1
On BEST ANSWER

Let's assume $\sigma=1$ and define: $$ p(r) = cr^{D-1}\exp(-r^2/2) = c\exp(-r^2/2 + (D-1)\log(r))$$ where $c$ is some positive constant. Then \begin{align} p(r+\epsilon) &= c\exp(-(r+\epsilon)^2/2 + (D-1)\log(r+\epsilon))\\ &=c\exp(\underbrace{-r^2/2} -\epsilon^2/2 - r\epsilon + \underbrace{(D-1)\log(r)} + (D-1)\log(1+\epsilon/r))\\ &=p(r)\exp(-\epsilon^2/2 - r\epsilon + (D-1)\log(1+\epsilon/r)) \end{align} where the two terms identified in the underbraces are gathered to form the $p(r)$ function again. Now using $\log(1+\epsilon/r) \approx \epsilon/r - \epsilon^2/(2r^2)$ gives \begin{align} p(r+\epsilon) \approx p(r)\exp(-\epsilon^2/2 - r\epsilon + (D-1)\epsilon/r - (D-1)\epsilon^2/(2r^2)) \end{align} Using $r=\sqrt{D}$ gives \begin{align} p(r+\epsilon) &\approx p(r) \exp(-\epsilon^2 - \epsilon/\sqrt{D} + \epsilon^2/(2D)) \end{align} and we would need to know more specifics of the "large $D$" assumption to determine the dominant term in the exponent, either $-\epsilon^2$ or $-\epsilon/\sqrt{D}$. Assuming $\sqrt{D}$ is much larger than $1/\epsilon$ then the above can be approximated by: $$ p(r+\epsilon) \approx p(r) \exp(-\epsilon^2)$$