Is this a theorem in Statistics? If not, can anyone explain why this seems to be true?

109 Views Asked by At

It seems to me that the following is true:

If $c$ is a random variable with probability $p(c)$ and cumulative probability $P(c)$, then the probability of $P(c)$ is constant.

I have taken $N$ random values consistent with a probability function (in my test case it was a simple gaussian $p(x) = \frac{2}{\sqrt{\pi}}Exp(-x^2)$) for which the cumulative PDF is $P(x) = Erf(x)$. When I create the $N$ random variables, I plot the histogram and get back the gaussian which lets me know that the $p(x)$ is correct. Then, for each value of $c$ which I have created, I evaluate $P(c)$ and then plot the histogram of $P(c)$ which seems to be constant (~$1/N$). The more points I create, the closer it becomes a perfectly (less noisy) horizontal line and for deviations from the true $p(c)$, it moves away from a straight line.

I have included the matlab code below. Note that as we let $N$ become larger (on line 1), the resulting graph in figure 2 becomes more and more perfectly straight. If we, however, change line 13 and allow for rv = 0.5*rand(), then the resulting $p(c)$ deviates from the true gaussian and that deviation is reflected in figure 2 as well.

 1 N = 50000;
 2
 3 cs = zeros(N,1);
 4 maxPx = 0.0;
 5 for i = 1:N
 6   while(cs(i) == 0)
 7     x = 10*(2*rand()-1);
 8     Px = (2/sqrt(pi))*exp(-x^2);
 9     if(Px > maxPx)
10       maxPx = Px;
11     end
12     rv = (2/sqrt(pi))*rand();
13 %    rv = 0.5*rand();
14     if(rv < Px)
15       cs(i) = x;
16     else
17     end
18   end
19 end
20
21 figure(1);
22 [ns, xs] = hist(cs, 100);
23 
24 plot(xs, ns, 'k.');
25 axis([min(xs) max(xs) 0 max(ns)]);
26
27 Ps = zeros(N,1);
28 for i = 1:N
29   x = cs(i);
30   Ps(i) = erf(x);
31 end
32
33 figure(2);
34 [ns, xs] = hist(Ps, 100);
35 plot(xs, ns, 'k.');
36 axis([min(xs) max(xs) 0 max(ns)]);

EDIT: included images (1) taking N = 50,000

The gaussian distribution is: enter image description here

for which the related probability is: enter image description here

(2) Letting N grow larger to 5,000,000 we have

The gaussian distribution: enter image description here

and the related probability: enter image description here

(3) Finally, when we change the original function from the gaussian distribution

we have for the distribution: enter image description here

and the probability: enter image description here

1

There are 1 best solutions below

0
On BEST ANSWER

The error function is not a cumulative distribution function: it can take negative values, while a probability cannot. Correcting this simply requires a linear adjustment to your data.

For a sufficiently well-behaved random variable (for example, one with a strictly increasing continuous cumulative distribution function), your statement is true.

If the cumulative distribution function of random variable $X$ is $F(x) = \Pr (X \le x)$ and has an inverse $G(y) = F^{-1}(y)$ for $0 \lt y \lt 1$, then define a new random variable $Y=F(X)$ and note $G(Y)=X$.

Now consider the cumulative distribution function of $Y$. What you are looking at is $\Pr(Y\le y) = \Pr(F(X)\le y)$ which is equal to $\Pr(X \le G(y))$. This has a cumulative distribution function $F(G(y))$ on $0 \lt y \lt 1$.

$F(G(y)) = F(F^{-1}(y)) = y$, which is the cumulative distribution function of a standard uniform random variable. So $Y = F(X)$ is uniformly distributed.