How is the continuous input probabalistic generative model derived from single class model?

29 Views Asked by At

So for the single valued model we have:

$p(C_1|\textbf{x}) = \frac{p(\textbf{x}|C_1)p(C_1)}{p(\textbf{x}|C_1)p(C_1)+p(\textbf{x}|C_2)p(C_2)}$

If we rearrange the terms, we can write this as a sigmoid function:

$\frac{1}{1+exp(-a)}=\sigma(a)$...(4.57)

where $a=ln \frac{p(\textbf{x}|C_1)p(C_1)}{p(\textbf{x}|C_2)p(C_2)}$...(4.58)

Then we moved onto the continuous input case where we assumed $p(C_k|\textbf{x})$ was gaussian:

$p(\textbf{x}|C_k) = \frac{1}{(2\pi)^{D/2}}\frac{1}{|\Sigma|^{1/2}}exp(-\frac{1}{2}(\textbf{x}-\mu_k)^T\Sigma^{-1}(\textbf{x}-\mu_k))$ ...(4.64)

Then I became confused when the text said, using 4.57 and 4.58 we have: $p(C_1|\textbf{x}) = \sigma(\textbf{w}^T\textbf{x}+w_0)$

where:

$\textbf{w} = \Sigma ^{-1}(\mu_1-\mu_2)$

$w_0=-\frac{1}{2}\mu_1^T\Sigma^{-1}\mu_1+\frac{1}{2}\mu_2^T\Sigma^{-1}\mu_2+ln\frac{p(C_1)}{p(C_2)}$

Is it saying that if I plug everything in the sigmoid I will recover from it $p(C_1|\textbf{x}) = \frac{p(\textbf{x}|C_1)p(C_1)}{p(\textbf{x}|C_1)p(C_1)+p(\textbf{x}|C_2)p(C_2)}$

but $p(x|C_1)$ and $p(x|C_2)$ are the normal distributions like in 4.64? Why can't we just use the Bayes theorem as is? Why do we have create a sigmoid function out of seemingly no where?

1

There are 1 best solutions below

0
On BEST ANSWER

"Can't we just use the Bayes theorem as it is?"

Yes we can. And yes, Bayes' theorem does indeed reproduce the formula you quoted:

\begin{align} p(C_1|\mathbf x) & = \frac{p(\mathbf x | C_1) p(C_1)}{p(\mathbf x | C_1) p(C_1) + p(\mathbf x|C_2) p(C_2)} \\ & = \frac{p(C_1)\exp\left( - \tfrac 1 2 (\mathbf x - \mu_1)^T\Sigma^{-1}(\mathbf x - \mu_1) \right)}{p(C_1) \exp\left( - \tfrac 1 2 (\mathbf x - \mu_1)^T\Sigma^{-1}(\mathbf x - \mu_1) \right)+ p(C_2) \exp\left( - \tfrac 1 2 (\mathbf x - \mu_2)^T\Sigma^{-1}(\mathbf x - \mu_2) \right)} \\ & = \frac{\exp\left( ( \mu_1 - \mu_2)^T\Sigma \mathbf x - \tfrac 1 2 \mu_1^T\Sigma \mu_1+ \tfrac 1 2 \mu_2^T\Sigma \mu_2 + \ln \tfrac {p(C_1)}{p(C_2)} \right)}{\exp\left( ( \mu_1 - \mu_2)^T\Sigma \mathbf x - \tfrac 1 2 \mu_1^T\Sigma \mu_1+ \tfrac 1 2 \mu_2^T\Sigma \mu_2 + \ln \tfrac {p(C_1)}{p(C_2)} \right) + 1} \\ & = \sigma(\mathbf w^T \mathbf x + w_0) \end{align} [NB in the second line, I omitted the factors of $\frac{1}{(2\pi \det \Sigma)^{d/2}}$ in the numerator and denominator - they cancel out.]

"What is the benefit in writing it in this way?"

Writing the result in this way makes it clear that the decision as to which class $\mathbf x$ is most likely to belong to is given in terms of a linear function of $\mathbf x$:

$$ p(C_1|\mathbf x ) > \tfrac 1 2 \ \iff \ \sigma(\mathbf w^T \mathbf x + w_0) > \tfrac 1 2 \ \iff \ \mathbf w^T \mathbf x + w_0 > 0.$$

In other words, the decision boundary is the linear hyperplane, $\mathbf w^T \mathbf x + w_0 = 0$.