How can we approximate a function by sampling a distribution proportial to it and making a histogram of samples?

263 Views Asked by At

I've read the following (here on page 2):

Suppose that you want to approximate a function $f$. One way to do this is to produce a sampling distribution proportional to $f$ and then make a histogram of samples taken from the distribution. The resulting histogram will be proportional to $f$ (obviously), so it only needs to be scaled to approximate $f$.

The procedure can be summarized as follows:

  • Create a sampling distribution proportial to $f$
  • Make a histogram of samples taken from the sampling distribution
  • Scale the histogram to approximate $f$

The sacle factor $s$ needed to make the histogram approximate $f$ is the ratio of the average value $v$ of $f$ over the sampling domain to the average number $h$ of samples per bin in the histogram, i.e. $s=v/h$.

I'm not sure how seriously this has to be taken, but could anybody explain to me (in a more formal way) what the author is meaning to say?

Let's consider a example: Assume $f$ is the density of the standard normal distribution $\mathcal N_{0,\:1}$. We could divide an interval $[a,b]$ into $C$ "bins" of size $\delta$. Now we could draw $n$ samples from $\mathcal N_{0,\:1}$ and record for each bin $i$ the number $B(i)$ of samples falling into that bin (if $x\in[a,b)$ is a sample, it lies in the $\lfloor\frac{x-a}\delta\rfloor$-th bin).

Clearly, $$[a,b)\ni x\mapsto B\left(\lfloor\frac{x-a}\delta\rfloor\right)\tag1$$ is an approximation of the shape of $f$.

Now, let $v$ be the average value of $f$ on $[a,b]$, $h$ be the average number of samples per bin and $s:=v/h$. If I got it right, the desired approximation would be $$\tilde f(x):=sB\left(\lfloor\frac{x-a}\delta\rfloor\right)\;\;\;\text{for }x\in[a,b).$$ Here's a plot of the result for $a=-5$, $b=5$, $C=2000$, $\delta=(b-a)/C$ and $n=1000000$:

plot

Obviously, the scale is not correct. Did I made any mistake or is there something wrong with the description in the paper?

1

There are 1 best solutions below

5
On BEST ANSWER

It looks like you're taking $v$ to be the sample average of $f(x)$ where $x$ is drawn from $N_{0,1}$ (perhaps conditioned to $x\in[a,b]$ - this wouldn't make much difference). Instead, $v$ should be the average of $f(x)$ for the uniform distribution on $[a,b]$ - this is what they mean by "selected at random from the sampling domain."

If you add up $f(x)$ at the points $a,a+\delta,\dots,a+(C-1)\delta,$ you should expect to get about $vC$ - the average of $f$ multiplied by the number of points. If you add up $\overline f(x)$ at these points, you get exactly $sn.$ So it makes sense to take $s=vC/n=v/h.$