Tractability of Expectations

439 Views Asked by At

I'm working my way through a paper about bounds on the mutual information [1]. However, I have some issues in understanding claims they make about the tractability of the different bounds.

Given:

$ q ( x | y ) = \frac { p ( x ) } { Z ( y ) } e ^ { f ( x , y ) } , \text { where } Z ( y ) = \mathbb { E } _ { p ( x ) } \left[ e ^ { f ( x , y ) } \right], $

they derive the Donsker-Varadhan bound $I _ { \mathrm { DV } }$ on the Mutual Information:

$ \mathbb { E } _ { p ( x , y ) } [ f ( x , y ) ] - \log \mathbb { E } _ { p ( y ) } [ Z ( y ) ] \triangleq I _ { \mathrm { DV } }. $

And claim that the bound is intractable. Then they state that tractability can be achieved by applying Jensen's inequality. Whereby they replace $log Z(y) = \log \mathbb { E } _ { p ( x ) } \left[ e ^ { f ( x , y ) } \right]$ with $\mathbb { E } _ { p ( x ) } [ f ( x , y ) ]$. Which results in:

$ \mathbb { E } _ { p ( x , y ) } [ f ( x , y ) ] - \mathbb { E } _ { p ( y ) } [f(x,y) ] . $

So due to my understanding the main difference between the tractable and untractable bound is the absence of the exponential inside of $\mathbb { E } _ { p ( x ) } $. Why is this now considered as tractable? Is it because for calculating the expectation we need to sum over $e ^ { f ( x , y )}$ which results in an "intractable" high number, compared to summing just over $ { f ( x , y )}$?

Similarly

$\begin{array} { l } { \mathbb { E } _ { p ( x , y ) } [ f ( x , y ) ] } - \mathbb { E } _ { p ( y ) } \left[ \frac { \mathbb { E } _ { p ( x ) } \left[ e ^ { f ( x , y ) } \right] } { a ( y ) } + \log ( a ( y ) ) - 1 \right] \end{array},$

is introduced as an tractable bound for any choice of $a(y)>0$. What makes this tractable compared to $I_{DV}$?

[1] https://arxiv.org/abs/1905.06922 (relevant parts in section 2.2)

1

There are 1 best solutions below

0
On

I assume what the authors want to say is that: an unbiased SGD estimator of these bound is (in)tractable.

Because the gradient of Monte Carlo estimation of Log-of-Expectation is biased. As discussed here: https://stats.stackexchange.com/questions/364293/is-stochastic-gradient-descent-biased