Proof of $\ln(p_k(x))=\delta_k(x)$

51 Views Asked by At

In "An introduction to Statistical Learning in R" by James, Witten, Hastie, and Tibshirani, on page 139-140, in the section concerning Linear Discriminant Analysis for p=1, assuming $f_k(x)\sim $ Gaussian

We are given

$p_k(x)=\frac{\pi_k\cdot\frac{1}{\sqrt{2\pi}\sigma_k}\exp\big(-\frac{1}{2\sigma_k^2}(x-\mu_k)^2\big)}{\sum_{l=1}^k\pi_l\cdot\frac{1}{\sqrt{2\pi}\sigma_k}\exp\big(-\frac{1}{2\sigma_l^2}(x-\mu_l)^2\big)} \qquad(A)$

and they say that "it is not hard to show that...taking the log of this and rearranging the terms" brings one to

$\delta_k(x)=x\cdot \frac{\mu_k}{\sigma^2}-\frac{\mu_k^2}{2\sigma^2}+\ln(\pi_k) \qquad (B)$

Just about the only part I understand about this is the variances are assumed to be the same, other than this I cannot figure out how taking the log goes from A to B

Might someone be able to demonstrate this or point to a resource?

1

There are 1 best solutions below

0
On BEST ANSWER

The authors are not claiming that expression (B) is equal to the logarithm of expression (A), it is not. What they are saying is that, under the given assumptions, an $x$ that is a maximum of $\delta_k$ will also be a maximum of $p_k$ (and vice-versa). To see this, you have to take the $\log$ as suggested : $$\begin{align}\log(p_k(x) ) &=\log\left(\frac{\pi_k\cdot\frac{1}{\sqrt{2\pi}\sigma_k}\exp\big(-\frac{1}{2\sigma_k^2}(x-\mu_k)^2\big)}{\sum_{l=1}^K\pi_l\cdot\frac{1}{\sqrt{2\pi}\sigma_k}\exp\big(-\frac{1}{2\sigma_l^2}(x-\mu_l)^2\big)}\right) \\ &= \log(\pi_k) -\frac{1}{2\sigma^2}(x-\mu_k)^2 +\log\left(\frac{1}{\sqrt{2\pi}\sigma}\right) - \log\left(\sum_{l=1}^K\pi_l\cdot\frac{1}{\sqrt{2\pi}\sigma}\exp\big(-\frac{1}{2\sigma^2}(x-\mu_l)^2\big)\right) \\ &= \log(\pi_k) -\frac{x^2}{2\sigma^2} + x \cdot\frac{\mu_k}{\sigma^2}-\frac{\mu_k^2}{2\sigma^2} -\log\left(\sum_{l=1}^K\pi_l\cdot\frac{1}{\sqrt{2\pi}\sigma}\exp\big(-\frac{1}{2\sigma^2}(x-\mu_l)^2\big)\right) + C \\ &=\underbrace{\log(\pi_k) + x \cdot\frac{\mu_k}{\sigma^2}-\frac{\mu_k^2}{2\sigma^2}}_{\delta_k(x)} + F(x) \end{align} $$ So we see where the expression of $\delta_k$ is coming from.
Now notice that all of the $K$ classes have a normal distribution with the same variance $\sigma^2$, so for any $k \in \{1,\ldots,K\}$, the term $F(x)$ as written above is the same. It is therefore enough to find the $k$ such that $\delta_k(x)$ is maximal to predict the class of $x$.