How to sample random variables X and Y from a joint distribution.

2.1k Views Asked by At

I have the following joint distribution function $f(x,y)$:

$$f(x,y)=\begin{cases} \frac{1}{30}xy+\frac{x}{y^2} & \text{ for } 1\le y\le 4,\ 1/2\le x\le 3/2\\ 0 & \text{ otherwise.} \end{cases}$$

The task is to simulate $N$ samples programatically, of $X$ and $Y$ to compute $E(X)$, $E(Y)$, $Var(X)$ and $Var(Y)$.

So far, I have tried the following steps:

  1. Calculate the joint CDF from the above PDF.
  2. Use some multi-variate Inverse transform sampling
  3. And then from the samples of X an Y, calculate sample mean and variance.

I need to know what inverse transform I can use for this PDF and how I can go about calculating sample mean and variance from the samples of X and Y. That is to say, if my approach is not entirely wrong.

3

There are 3 best solutions below

0
On BEST ANSWER

Let the pdfs of $X$ and $Y$ be $$f(x)=x, 1/2\le x\le3/2\\h(y)=\frac y{30}+\frac1{y^2}, 1\le y\le4$$

Acceptance-rejection is my favorite for univariate simulations, so I will use it here. We use a uniform distribution to sample from for both $X$ and $Y$ (though over their respective intervals) and notice that $(\frac 32 +.1)g_1(x)\ge f(x)$ for $1/2<x<3/2$ with $f(x)$ the pdf of $X$ and $g_1(x)$ the pdf of the uniform on $[1/2, 3/2]$.

Similarly to simulate from $Y$ we notice that $(\frac{31}{10}+.1)g_2(y)\ge h(y)$ for $1<y<4$. The algorithm is implemented below.

y=Vectorize(function(y) {
  return (y/30+1/y^2)
})

pdf_f=Vectorize(function(x) {
  return (x)
})

pdf_g=function(x) {
  return(1)
}

y=c()
for (i in 1:10000) {
  while (1>0) {
    x=runif(1, 1/2, 3/2)
    u=runif(1)
    if (pdf_f(x) / pdf_g(x) >= (3/2 + .1) * u) {
      y[i]=x
      break
    }
  }
}

par(mfrow=c(1,2))

xx=seq(1/2, 3/2, length=1000)

ys=pdf_f(xx)

hist(y, freq=FALSE)
lines(xx, ys)

pdf_h=Vectorize(function(x) {
  return(x/30+1/x^2)
})

z=c()
for (i in 1:10000) {
  while (1>0) {
    x=runif(1, 1, 4)
    u=runif(1)
    if (pdf_h(x) / pdf_g(x) >= (31/10 + .1) * u) {
      z[i]=x
      break
    }
  }
}

hist(z, freq=FALSE)
w=seq(1, 4, length=1000)
zs=pdf_h(w)
lines(w, zs)

u=runif(10000)
v=sqrt(2*(u+1/8))
hist(v, freq=FALSE)
lines(xx, ys)

The true expected values are variances are $E(X)=\frac{13}{12}\approx 1.0833, Var(X)\approx .07639, E(Y)=\log 4+\frac 7{10}\approx 2.086, Var(Y)=.7724$. Unbiased estimates of these are given by

$$\hat \mu_1=\bar X, \hat \mu_2=\bar Y, \hat {\sigma_1^2}=\frac 1 {n-1}\sum_{i=1}^n (X_i-\bar X)^2, \hat {\sigma_2^2}=\frac 1 {n-1}\sum_{i=1}^n (Y_i-\bar Y)^2$$

The simulated values are c(mean(y), var(y), mean(z), var(z)) which gives 1.07988980 0.07734089 2.07852920 0.76948369.

A histogram of the simulated values with density overlaid:

enter image description here

We can also use the probability integral transform to sample from $X$. The cdf of $X$ is $F(x)=\int_{1/2}^x tdt=\frac{x^2}2-\frac 18, \frac 12 < x < \frac 32$. The inverse of this is given by $F^{-1}(y)=\sqrt {2\left(y+\frac 18\right)}, 0<y<1$. Therefore if $y$ is selected uniformly on $[0,1]$ then $F^{-1}(y)$ will be a single sample from the pdf of $X$.

u=runif(10000)
v=sqrt(2*(u+1/8))
hist(v, freq=FALSE)
lines(x, ys)

enter image description here

The cdf of $Y$ is not easily inverted so the probability integral transform sampling won't work.

Come back with an example that does not factor into pdfs of $X$ and $Y$ and we can try something less dull.

0
On

Update: I just noticed that your $X$ and $Y$ are independent, so you may very well generate them separately.

Hint: In the general case, I would use conditional formula:

$$f_{X,Y}(x,y)=f_{X|Y}(x|y)\cdot f_Y(y)$$

So I would generate first a random variable distributed as one of the marginals, and then generate the other one with conditional distribution.

0
On

You know that $f(x, y) \leq C$ on the domain where $C = (3/2)(31/30)$, so a rough and ready acceptance-rejection method would be:

  1. Generate $X$ uniformly on $[\frac{1}{2}, \frac{3}{2}]$, $Y$ on $[1, 4]$, and $Z$ on $[0, C]$.
  2. If $Z \leq f(X, Y)$ accept $X$ and $Y$, otherwise repeat at 1.