In "Probabilistic Graphical Models" book by Daphne Koller and Nir Friedman they have the following approximation of probability of r successful outcomes of N Bernoulli trials:
$P(S_N=r)\approx \exp(-N \cdot \mathbb{D}((p,1-p) \parallel (r/N,1-r/N)))$
Here $p$ is the probability of success, $N$ is the number of all trials, $r$ is the number of successful trials, $\mathbb{D}$ is Kullback–Leibler divergence (relative entropy) .
I'm struggling with the proof of the approximation. Here is what I get so far:
First, we know that $P(S_N=r)$ is equal to $A=\frac{N!}{r! \cdot (N-r)!} \cdot p^r \cdot (1-p)^{N-r}$
Using Stirling approximation we can get $\log(A)\approx N\log(N)-r\log(r)-(N-r) \cdot \log(N-r)+r \cdot \log(p)/\log(2)+(N-r) \cdot \log(1-p)/\log(2)$.
The right hand side of the approximation is equal to $B=\exp(-N \cdot (p \cdot \log_2(\frac{p}{r/N})+(1-p) \cdot \log_2(\frac{1-p}{1-r/N})))$
$\log(B)=-N \cdot (p \cdot \log_2(\frac{p}{r/N})+(1-p) \cdot \log_2(\frac{1-p}{1-r/N}))$
I do not see any connection between $\log(A)$ and $\log(B)$. Should I use another approximation of binomial coefficient? Any other ideas about the proof of the approximation?
Did pointed out the main points above. I will develop the ideas here and will write a proof for my own question.
Let $r=x \cdot N$
$\log(A)\approx r/x\cdot\log(r/x)-r\cdot\log(r)-(r/x-r)\cdot\log(r/x-r)+ r \cdot \log(p)+(r/x-r) \cdot \log(1-p)=r/x\cdot\log(r)-r/x \log(x)-r/x \log r- r/x \log(1-x)+r/x \log x + r\log r+r\cdot\log (1-x)-r\log x-r\log r+r\log p +r/x \log(1-p) -r \log(1-p) =-r/x \cdot \log (1-x)+r\cdot \log (1-x) -r\cdot \log x + r \cdot \log p + r/x \cdot \log(1-p) -r\cdot \log(1-p)$
$\log(B)=-r/x \cdot (x\log x/p + (1-x) \log (\frac{1-x}{1-p}))=-r \cdot \log x +r\cdot \log p- r/x \cdot \log (1-x) +r \cdot \log(1-x)+r/x \cdot \log(1-p)-r \cdot \log (1-p)$
It is easy to see that the last expression we get for $\log A$ is equivalent to last expression for $\log B$.
Thank you, Did. I finally solved the question which bothered me for a whole week.