Simulating Random Vectors Based on Conditioning

159 Views Asked by At

I'm working on a project where I need to simulate random vectors $(Y, X_1,\dots,X_n)$ in order to understand the joint distribution $f(y,x_1,\dots,x_n)$. I wish to simulate enough random vectors so that I can empirically estimate the marginal distribution $F_Y(y)$ to some specified confidence. For example, I may want to continuing simulating until I am 99% confident that the estimated median is within 5% of its true value or until the change in $F_Y(y)$ between two simulations is small in some sense.

I am given the following:

  • The random vector $(X_1,\dots,X_n)$ is jointly continuous. The specific form of its pdf $f(x_1,\dots,x_n)$ or cdf $F_{(X_1,\dots,X_n)}(x_1,\dots,x_n)$ is known so that I can simulate random vectors from it.
  • The random variable $Y\mid(X-1,\dots,X_n)$ is continuous and the specific form of its pdf $f_{Y\mid(X-1,\dots,X_n)}(y\mid x_1,\dots,x_n)$ or cdf $F_{(Y\mid X_1,\dots,X_n)}(y\mid x_1,\dots,x_n)$ is known so that I can also simulate random variables from it.

It seems intuitive that I can do the following:

  1. Simulate a random vector $(X_1,\dots,X_n)$.
  2. Use the specific $(X_1,\dots,X_n)$ from step 1 to simulate a specific $Y$.
  3. Repeat this say $1$ million times.
  4. I should then have a random sample of vectors from the desired pdf $f(y,x_1,\dots,x_n)$

Is this approach mathematically sound? How could I write this out formally to prove that the algorithm does indeed work.

Thanks, Gelfan

1

There are 1 best solutions below

2
On

Explanatory comment, not an answer.

I believe this is a plan to justify a method widely used in simulation. And I believe the comment by @AlexR. goes toward a solution.

However, the process is not as trivial in practice as the Comment may suggest. Knowing how to simulate $(X_1, \dots, X_n)$ does not necessarily require that you can write their joint pdf or cdf. Maybe $X_i$ are from a disagreeable collection of continuous and discrete random variables, each of which can be simulated, possibly with some interdependences with others.

Then suppose $Y$ is a project bid that is a messy function of the $X_i$. It is often useful to be able to find probabilities involving the $X_i$ conditional on some specific information about $Y.$ For example, to know the conditional mean of $X_3$ given that the bid was rejected. Or to know the probability $X_5 > 1000$ given that the bid was rejected and $X_3$ is below its mean.

This is the utility of being able to trust the resulting joint distribution $(Y, X_1, \dots, X_n).$ Maybe not the best example, but that's the flavor of a lot of practical simulations. And it is nonproprietary, and sufficiently vague, to avoid dealing with lawyers.