Deriving equation in vector notation

149 Views Asked by At

I had some trouble deriving an equation from the book 'Elements of statistical Learning' p. 108 equation 4.9. This heavily relies on linear algebra, so I was wondering how the author came to his final equation. Is there a simple answer for this derivation?

Given the multivariate normal distribution:

$$ f_k(x)=\frac{1}{(2\pi)^{p/2} |\Sigma_k|^{1/2}}\exp(\frac{1}{2}(x-\mu_k)^T\Sigma_k^{-1}(x-\mu_k)) $$

you can calculate the log-ratio as follows according to the book:

$$ log(\frac{Pr(G=k|X=x)}{Pr(G=l|X=x)} = log\frac{f_k(x)}{f_l(x)} + log\frac{\pi_k}{\pi_l} $$

$$ =log\frac{\pi_k}{\pi_l} - \frac{1}{2}(\mu_k+\mu_l)^T{\Sigma}^{-1}(\mu_k-\mu_l)+x^T\Sigma^{-1}(\mu_k-\mu_l) $$

I tried to fill in the equation, but did not succeed to get to the final result. Does anybody have an idea how this can be worked out?

1

There are 1 best solutions below

1
On BEST ANSWER

In a blow-by-blow way, we have:

$\frac{f_k(x)}{f_l(x)} = \frac{(2 \pi)^{p/2}|\Sigma|^{1/2}}{(2 \pi)^{p/2}|\Sigma|^{1/2}} \exp\big(-\frac{1}{2}(x-\mu_k)^T\Sigma^{-1}(x-\mu_k)+\frac{1}{2}(x-\mu_l)^T\Sigma^{-1}(x-\mu_l)\big)$

= $\exp\big(-\frac{1}{2}x^T\Sigma^{-1}x + \frac{1}{2}x^T\Sigma^{-1}\mu_k + \frac{1}{2}\mu_k^T\Sigma^{-1}x-\frac{1}{2}\mu_k^T\Sigma^{-1}\mu_k + \frac{1}{2}x^T\Sigma^{-1}x - \frac{1}{2}x^T\Sigma^{-1}\mu_l - \frac{1}{2}\mu_l^T\Sigma^{-1}x+\frac{1}{2}\mu_l^T\Sigma^{-1}\mu_l\big)$

(I have multiplied everything out)

= $\exp\big(\frac{1}{2}x^T\Sigma^{-1}\mu_k + \frac{1}{2}\mu_k^T\Sigma^{-1}x-\frac{1}{2}\mu_k^T\Sigma^{-1}\mu_k - \frac{1}{2}x^T\Sigma^{-1}\mu_l - \frac{1}{2}\mu_l^T\Sigma^{-1}x+\frac{1}{2}\mu_l^T\Sigma^{-1}\mu_l\big)$

(I have cancelled the terms involving $x^T\Sigma^{-1}x$)

= $\exp\big(\frac{1}{2}x^T\Sigma^{-1}\mu_k + \frac{1}{2}x^T\Sigma^{-1}\mu_k-\frac{1}{2}\mu_k^T\Sigma^{-1}\mu_k - \frac{1}{2}x^T\Sigma^{-1}\mu_l - \frac{1}{2}x^T\Sigma^{-1}\mu_l+\frac{1}{2}\mu_l^T\Sigma^{-1}\mu_l\big)$

(I have taken the transpose of $\frac{1}{2}\mu_k^T\Sigma^{-1}x$ to get $\frac{1}{2}x^T\Sigma^{-1}\mu_k$ and similarly for $\frac{1}{2}\mu_l^T\Sigma^{-1}x$)

= $\exp\big(x^T\Sigma^{-1}\mu_k - x^T\Sigma^{-1}\mu_l - \frac{1}{2}\mu_k^T\Sigma^{-1}\mu_k+\frac{1}{2}\mu_l^T\Sigma^{-1}\mu_l\big)$

= $\exp\big(x^T\Sigma^{-1}(\mu_k - \mu_l) - \frac{1}{2}\mu_k^T\Sigma^{-1}\mu_k+\frac{1}{2}\mu_l^T\Sigma^{-1}\mu_l\big)$

= $\exp\big(x^T\Sigma^{-1}(\mu_k - \mu_l) - \frac{1}{2}\mu_k^T\Sigma^{-1}\mu_k - \frac{1}{2}\mu_l^T\Sigma^{-1}\mu_k+\frac{1}{2}\mu_k^T\Sigma^{-1}\mu_l+\frac{1}{2}\mu_l^T\Sigma^{-1}\mu_l\big)$

(I have added $-\frac{1}{2}\mu_l^T\Sigma^{-1}\mu_k+\frac{1}{2}\mu_k^T\Sigma^{-1}\mu_l$ which equals zero)

= $\exp\big(x^T\Sigma^{-1}(\mu_k - \mu_l) - \frac{1}{2}(\mu_k^T + \mu_l^T)\Sigma^{-1}\mu_k +\frac{1}{2}(\mu_k^T+\mu_l^T)\Sigma^{-1}\mu_l\big)$

= $\exp\big(x^T\Sigma^{-1}(\mu_k - \mu_l) - \frac{1}{2}(\mu_k^T + \mu_l^T)\Sigma^{-1}(\mu_k-\mu_l)\big)$

= $\exp\big(x^T\Sigma^{-1}(\mu_k - \mu_l) - \frac{1}{2}(\mu_k + \mu_l)^T\Sigma^{-1}(\mu_k-\mu_l)\big)$

Therefore

$\log \frac{f_k(x)}{f_l(x)} + \log \frac{\pi_k}{\pi_l}= \log \frac{\pi_k}{\pi_l} - \frac{1}{2}(\mu_k + \mu_l)^T\Sigma^{-1}(\mu_k-\mu_l) + x^T\Sigma^{-1}(\mu_k - \mu_l)$