Does MCMC method can be used to calculate the mean and variance of the distribution of random variable functions?

102 Views Asked by At

I am not professional in Probability & Statisticsin, in order to clearly describe my problem, please be patient of the long introduction.THANKS!

Background of my question

Assume I have several independent random variables, say X and Y, and their distributions are already known, for example: $X \sim N(0,1)$ and $Y \sim N(0,1)$. Now there is a random function of X and Y, $$g(X,Y)=X^2+Y^2$$, $g(X,Y)$ is a random variable too, make $Z=g(X,Y)$, the distribution of $g(X,Y)$ is unknown, mark it as $f_{X,Y}(z)$. My goal is to get the expectation of $g(X,Y)$'s distribution which is $E[Z]$.

Here, in this example, the $g(X,Y)$ has a easy form, I can tell g(X,Y) 's distribution is chi-square distribution: $Z=g(X,Y) \sim \chi^2_2$. But when the form of $g()$ is more complex, I can't say out the distribution of $Z$ directly, more difficult to get the expectation by $Z$'s distribution. --BUT--, I found a web page on wiki which talking about "Law of the unconscious statistician". Then I realized I can get the expection of $g(X,Y)$'s distribution by $$E[Z]=\int g(x,y)*f_{X,Y}(x,y)dxdy \tag{1}$$, here: $f_{X,Y}(x,y)$ is the joint distribution of X and Y; $g(x,y)$ is the random function of X and Y. For the equation $(1)$, I think the MC(Monte Carlo) method is a suitable way to deal with the integral. Then I rewite equation$(1)$ based on importance sampling:$$ \begin{split} E[z] &= \int g(x,y)*f_{X,Y}(x,y)dxdy \\ &= \int \frac{g(x,y)*f_{X,Y}(x,y)}{h(x,y)}h(x,y)dxdy \\ &\approx \sum_{i=1}^n \frac{g(x_i,y_i)*f_{X,Y}(x_i,y_i)}{h(x_i,y_i)}\end{split} \tag{2} $$ Here $h(x,y)$ is the proposal distribution, and it's form is known. The form of $f_{X,Y}(x,y)$ is explicit, I can get samples from the joint distribution, mark the sample set as: $\{(x_i,y_i)| i=1,2,3,...,n\}$, then putting every pair of samples into equation$(2)$, I can get an approximation of $E[Z]$. As for the way to generate samples, Metroplis-Hasting Algorithm or Gibbs Sampling and so on are available. Until now, the expectation of $Z$ could be calculated by the above statement if I'm not wrong.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ What I want to test is: the way I mentioned above to calculate the expectation by sampling is right. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

So I did some experiments to test and verify. I just use the example at the begin of this introduction: $$\begin{split} X \sim N(0,1) \quad \quad Y \sim N(0,1) \\ Z = g(X,Y) = X^2 + Y^2 \end{split}$$ I know the accurate distribution of $Z$ is $\chi ^2(2)$, then I use two way to calculate the expectation of $Z$, the two results should be equal.

  1. First way: generate samples directly from $\chi ^2(2)$: $\{z_i | i=1,2,3,...,n\}$, then:$$E[Z]=\sum^n_{i=1}z_i $$ the result should be about 2, and the variance should be about $2*2=4$, this result depend on the property of $\chi^2 distribution$.
  2. The second way is the sampling method I mentioned before.

I implement two ways in python as shown blew:

import numpy as np
import time
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as st
from scipy.stats import chi2
%matplotlib inline
# np.random.seed(123)

def sampler(mean=[0,0], cov=[[1,0],[0,1]], n=1):
    '''generate samples from joint distribution,
       the random variable are independ under defult
    '''
    return st.multivariate_normal(mean, cov).rvs(n) 

def g(point):
    return point[0]**2+point[1]**2

iterN = 50000
k=2 # the freedom of chi-square distribution
samples_from_chi2 = chi2.rvs(k, loc=0, scale=1, size=iterN)
mean_chi2 = np.mean(samples_from_chi2)

begin = np.array([5, 5]) # the statr sampling point
samples = np.array(begin)

mulN_mean = np.array([0,0])
mulN_cov = np.array([[1,0],[0,1]])
multiNormal = st.multivariate_normal
p = multiNormal(mulN_mean, mulN_cov)

for i in range(iterN):
    father = begin
    pdf_father = p.pdf(father)
    son = multiNormal(father, mulN_cov).rvs(1)
    pdf_son = p.pdf(son)
    # print("father={a:%f}, son={b:%f}".format(a=father, b=son))

    ''' 
    in this example, P(father|son) == P(son|father) is true,
    so just ignore the conditional probability of Metropolis–Hastings 
    algorithm
    '''
    # q_son_father = multiNormal(father, mulN_cov).pdf(son)
    # q_father_son = multiNormal(son, mulN_cov).pdf(father)
    # print("q(son|father)=", q_son_father)
    # print("q(father|son)=", q_father_son)

    g_father = g(father)
    g_son = g(son)

    r_up = g_son * pdf_son
    r_down = g_father * pdf_father
    r = r_up / r_down

    if r >= u:
        samples = np.vstack((samples, son))
    else:
        samples = np.vstack((samples, father))
    begin = samples[-1]

# print("samples=\n", samples)

G_samples = samples[:,0]**2 + samples[:,1]**2
mean_from_sample = np.mean(G_samples)
print("chi2 mean:{a}    sample_mean:{b}".format(a=mean_chi2, 
b=mean_from_sample))

x = range(iterN+1)
y_max = np.max(G_samples)
y = np.linspace(0, y_max, int(y_max*40))
fig, (ax1, ax2) = plt.subplots(2,1)
ax1.plot(x,G_samples,'-')
ax1.set_title('Trace plot')
ax1.set_xlabel('Number of points')
ax1.set_ylabel('Points location')
ax2.hist(G_samples, int(iterN/100),normed=1)
ax2.set_title('Histogram plot')
ax2.set_xlabel('Points location')
ax2.set_ylabel('Probability')
ax2.plot(y,st.chi2.pdf(x=y, df=2), 'r')
fig.subplots_adjust(left=None, bottom=None, right=None, top=None,
            wspace=None, hspace=0.8)

The result of the implementation:

chi2 mean:2.178781136596056    sample_mean:3.96530490041925

Sorry, I don't have 10 reputation to post the picture here, picture of sample trace and histogram.

Here my question comes

No matter the number of sample points is 400 or 10000 or between them, the mean of the samples directly from $\chi^2_2$ distribution is about 2 (I confirm it is right, because if the freedom of chi2 distribution is k, then the expectation is k, and the variance is 2k), but the expectation calculated by MC method is about 4 (I have tested many times with different sample count, it's always about 2 times to the result of directly $\chi^2_2$ samples) just as shown above.

I want to know why there is a 2 times relationship between two results, if the const's exist is right, how to find it's value when I don't know the random variable function's distribution(in this example, chi-square distribution). Or there is way to decrease the const, so that the sampling method result is extremely approximate to the true value of the expection of the random function.

Thanks for reading here:), if there are mistakes in the above statement, please tell me. If you can explain the difference, it would be the best for me, please give me some advice about my question, thank you very much! Have a good day :)


updata

I tried sampling X and Y depedently and without Metroplis-Hasting method. it's right indeed, the expectations are equal, the shape of histogram and $\chi^2(2)$ pdf line are similar.

Here is result of my check:result picture without M-H method