Approximating the expectation of a function with sampling

1k Views Asked by At

I'm reading a paper (section 5.1) that approximated the expected value of a function $f(X,Y)$ of two random variables using Gibbs sampling.

As far as I know the expectation of $f(X,Y)$ is defined to be:

$E[f(X,Y)] = \sum_{x}\sum_{y}p(X=x,Y=y) f(x,y)$

what they said in the paper is that this is very expensive to compute so they said that we approximate the expectation by drawing a sample from the joint distribution using Gibbs sampling and then calculate the expectation as the following:

1- Draw a sample $x$ using Gibbs sampling.

2- Then $E[f(X,Y)] = \sum_{y}p(Y=y|x) f(x,y)$.

What I don't understand is step 2. I don't why we removed the summation over all values of $x$ and why we are using the conditional probability and the link or idea behind how a sample of $x$ from the joint distribution would make up that equation in 2. How and why is that equivalent to the expectation?!

p.s. I don't have excellent skills in probability theory so please elaborate your answers :)

1

There are 1 best solutions below

0
On

I'll address your questions in order:

  1. The $x$ disappears from the second equation because it is being provided by the Gibbs Sampler, hence it is not a explicit variable of integration.

  2. The second equation is indeed confusing, because the LHS does not actually mathematically follow from the RHS! They are relying on the behavior of the Gibbs Sampler to make the second equation work out correctly. Specifically:

The RHS of the equation is actually giving you the conditional expected value of the function, given a value for $X$: $E[f(X,Y)|X=x] = \sum_{y}p(Y=y|x) f(x,y)$.

Now, the unconditional expecte value of the function is as you have written:

$E[f(X,Y)] = \sum_{x}\sum_{y}p(X=x,Y=y) f(x,y)$

I think the missing link, so to say, is that the above equation can be re-written in terms of conditional expectations:

$E[f(X,Y)] = \sum_{x}P(X=x)E[f(X,Y)|X=x]$

At first glance, this re-formulation looks equally useless as the original for complex functions. However, the key is that the Gibbs sampler will tend to visit each value of $X$ in proportion to its marginal probability, so when you average the conditional expectations, given X, over a large number of random draws of X, you will end up with the unconditional expected value of X.

Basically, instead of analytically averaging out X and Y, you are solving the (usually simpler) problem of analytically averaging out Y given X, then using the Gibbs sampler to perform the "averaging" over X (think Monte Carlo integration, essentially).