From $P(x;W) = \frac{1}{Z(W)} \exp \bigl[ \frac{1}{2} x^T W x \bigr]$ to Sigmoid

143 Views Asked by At

In a book chapter that talks about the Boltzmann distribution,

$$ P(x;W) = \frac{1}{Z(W)} \exp \bigg[ \frac{1}{2} x^T W x \bigg] $$

where $W$ is symmetric with zero diagonal. It makes a seque into a conditional probability definition where

$$ P(x_i = 1 | x_j, j \ne i) = \frac{1}{1 + e^{-a_i}} $$

$$ a_i = \sum_j w_{ij}x_j $$

How did the author make this jump from the original distribution, into one which uses the sigmoid (logistic) function? Some guesses are the conditional distribution written in this form

$$ P(x_1|x_j,i \ne j) = \frac{\frac{1}{Z} P(x_1,...,x_N)}{\frac{1}{Z} \sum_{x_i}P(x_1,...,x_N)} = \frac{ P(x_1,...,x_N)}{\sum_{x_i}P(x_1,...,x_N)} $$

$1/Z$'s would cancel out, and this could lead to the single sigmoid?

Any ideas?

2

There are 2 best solutions below

0
On BEST ANSWER

From @Haderlump

Let's define $x_i$ on $x$ being $b$ as $x^b$. Since

$$ P(x_j,j \ne i) = P(x^0) + P(x^1) $$

We can do

$$ P(x_i = 1 | x_j,j \ne i) = \frac{P(x^1)}{P(x^0) + P(x^1)} $$

It's better to see $P(x^1)$ as divider, and cleaner to get rid of the +1 in $1/1 + e^{-a_i}$, so we flip the division and subtract one,

$$ 1 / P(x_i = 1 | x_j,j \ne i) = \frac{P(x^0) + P(x^1)}{P(x^1)} $$

$$= 1 + \frac{ P(x^0)}{P(x^1)} $$

Subtracting one, we get $\frac{ P(x^0)}{P(x^1)}$. On the original equation we are also left with $e^{-a_i}$. Now it is enough to show $\frac{ P(x^0)}{P(x^1)}$ equals to $e^{-a_i}$.

$$ \frac{ P(x^0)}{P(x^1)} = \exp( x^{0^T}Wx^0 - x^{1^T}Wx^1 ) $$

Using this equality

$$ x^TWx = \sum_{k,j} x_kx_jw_{kj} $$

If we define

$$ \sum_{k,j} \underbrace{x_kx_jw_{ij}}_{Y_{kj}} = \sum_{k,j}Y_{kj} $$

$$ = \sum_{k \ne i}\sum_j Y_{kj} + \sum_{j} Y_{ij} $$

$$ = \sum_{k \ne i}( \sum_{j \ne i} Y_{kj} + Y_{ki}) + \sum_{j} Y_{ij} $$

$$ = \sum_{k \ne i,j \ne i} Y_{kj} + \sum_{k \ne i} Y_{ki} + \sum_{j} Y_{ij} $$

$$ = \sum_{k \ne i,j \ne i} Y_{kj} + \sum_{k} Y_{ki} + \sum_{j} Y_{ij} + Y_{ii} $$

Let's use the equation above for $ \exp( x^{0^T}Wx^0 - x^{1^T}Wx^1 )$

$$ \exp \big( \sum_{k} Y_{ki}^0 + \sum_{j} Y_{ij}^0 + Y_{ii}^0 - ( \sum_{k} Y_{ki}^1 + \sum_{j} Y_{ij}^1 + Y_{ii}^1 ) \big) $$

$$ = \exp \big( 0 - ( \sum_{k} Y_{ki}^1 + \sum_{j} Y_{ij}^1 + Y_{ii}^1 ) \big) $$

$W$ is symmetric, then $\sum_{k} Y_{ki}^1$ is the same as $\sum_{j}Y_{ij}^1$

$$ = \exp \big( - ( 2 \sum_{j} Y_{ij}^1 + Y_{ii}^1 ) \big) $$

$W$ is zero diagonal, so $Y_{ii}^1=0$,

$$ = \exp \big( 2 \sum_{j} Y_{ij}^1 \big) = \exp (- 2 a_i ) $$

In the original equation there is $1/2$ which was not included in the derivation, hence we get $\exp (- a_i)$.

2
On

Start from the fact that, for each $i$, the density of the full random vector $(X_j)_j$ can be factored as $$P(X_i=x_i,\hat X_i=\hat x_i)=h(\hat x_i)\cdot\mathrm e^{a_i(x_i,\hat x_i)x_i},\qquad$$ for some positive function $h$ whose exact value is irrelevant, with $$\hat X_i=(X_j)_{j\ne i},\qquad\hat x_i=(x_j)_{j\ne i},\qquad a_i(x_i,\hat x_i)=\tfrac12w_{ii}x_i+\tfrac12\sum\limits_{j\ne i}(w_{ij}+w_{ji})x_j.$$ Hence, for every $\hat x_i$, $$P(X_i=1\mid\hat X_i=\hat x_i)=\frac{P(X_i=1,\hat X_i=\hat x_i)}{P(X_i=1,\hat X_i=\hat x_i)+P(X_i=0,\hat X_i=\hat x_i)}=\frac{\mathrm e^{a_i(x_i,\hat x_i)}}{\mathrm e^{a_i(x_i,\hat x_i)}+1}.$$ The formula for $a_i(x_i,\hat x_i)$ given in the question holds if $W=(w_{ij})_{ij}$ is symmetric ($w_{ij}=w_{ji}$) with zero diagonal ($w_{ii}=0$).