Sample $x$ from $g(x)$

169 Views Asked by At

I got confused with all this randomness and probability functions. I was trying to implement the rejection sampling method which (apparently) is really simple. I was reading from Rejection Sampling in Wikipedia and the first step says sample $x$ from $g(x)$ and $u$ from $U(0,1)$ what does this mean?

If I have a uniform distribution, when someone says "Generate/Sample/Draw a sample from $U(0,1)$" means that I have to take a random value from the uniform distribution right? In matlab terms, using the function rand will generate a sample from $U(0,1)$ right?

Now, for any pdf $g(x)$ defined over $(a, b)$. If I want to sample from $g(x)$ I have to pick a value of $x$ between $a$ and $b$ and evaluate the function $g(x)$ ? Is this correct ? Then to pick the value of $x$ I can use a uniform distribution $U(a,b)$ to sample from, is this true?

I got confused easily. Thanks in advance.

UPDATE

I appreciate your help. But this is still not clear to me. Say I have a probability density function $g(x)=3x+1$. What would be the result of draw a sample from g(x) or pick any $x_0$ distributed according to $g(x)$?

UPDATE

I think I got it. I was confused with the graph of the PDF. When someone asks "draw a sample from a PDF, say $g(x)$" it should return a value for $x$ and this value has to be distributed as the function, that means that more samples will be drawn where the value of $g(x)$ is bigger. Does this sounds about right?

Finally, if I want to use the function I said before $g(x)=3x+1$ I have to use for instance the inverse method to draw samples form that distribution properly.

I think the confusion arose because I thought of the value of $g(x)$ as the probability but this was wrong, the value of $g(x)$ is some kind of density (is this correct?). Hope you can correct me or confirm my hypotheses.

2

There are 2 best solutions below

0
On BEST ANSWER

I think the Wikipedia article is misleading here.

Rather than saying sample from $g(x)$ which doesn't strictly speaking make sense it should read sample from a distribution whose probability density function is $g$.

A distribution is a way of describing a random number. To specify the distribution of $X$ I need to be able to calculate the probability $\mathbb P(a < X \leq b)$ for every pair $a \le b$. For a continuous random variable it's enough to specify its probability density function $g$.

So if $X$ has probability density function $g$ then $$\mathbb P(a < x \leq b) = \int_a^b g(x) dx$$

Alternatively I can specify the cumulative distribution function $$G(x) = \mathbb P(X\leq x) = \int_{-\infty}^x g(y)dy.$$

Now I can specify $\mathbb P(a < x \leq b) = G(b) - G(a)$.

The reason we use PDF's is that sometimes we can't write down the cumulative distribution function, because not all integrals have nice results.

Now if $X$ has cumulative distribution function $G$ I can sample from $X$ simply by taking a uniform $(0,1)$ random variable $U$ and setting $X = G^{-1}(U)$. Because $G$ must be increasing we have $$\begin{array}{rl}\mathbb P(X<x) &= \mathbb P (G(X) <G(x)) \\&=\mathbb P(U < G(x)) \\&=G(x)\end{array}$$

So if $X$ has a probability density function $g$ we need to integrate $g$ and find the inverse. Then I can sample using the uniform distribution.

1
On

In its simplest form, rejection goes like this: Assume $g\colon [a,b]\to [0,M]$ is the pdf of the desired distribution. Then generate a uniformly distributed $x\in[a,b]$ (that is $x=a+b\cdot{\mathbf {rand}}()$) and independently $y\in[0,M]$ (that is $y=M\cdot{\mathbf {rand}}()$). Accept $x$ as result if $y\le g(x)$, otherwise repeat.

That is: We pick a random point in $[a,b]\times[0,M]$ and accept its $x$-value if it is below the graph of $g$. Thus $x$ with $g(x)$ small are less likely to be picked. Unfortunately, the probability of success is $\frac1{(b-a)M}$, which may be small, i.e. lots of resampling takes place. To remedy this, one starts with a different distribution for $x$ (of course a distribution that can be simulated faster than by rejection) and replaces the constant $M$ by a suitable multiple $Mf$ of the pdf $f$ for this $x$ (we need the scaling to ensure $MF\ge g$). The easiest way to grasp may be if you use a staircase function.