Simulating expectation of continuous random variable

515 Views Asked by At

Given the probability density function

\begin{equation} f(x)=\begin{cases} kx, & \text{$0 \leq x \leq 1$}.\\ 0, & \text{otherwise}. \end{cases} \end{equation}

I've found that k = 2, so $$E(X) = \int_{-\infty}^{\infty} xf(x)\,dx = \int_{0}^{1} 2x^2\,dx = \frac{2}{3}$$

$$E(X^2) = \int_{-\infty}^{\infty} x^2f(x)\,dx = \int_{0}^{1} 2x^3\,dx = \frac{1}{2}$$

$$Var(X) = E(X^2) - E(X)^2 = \frac{1}{2} - \frac{2}{3}^2 = \frac{1}{18}$$

But when I generated some random variables using python, this is what I have with expectation:

import numpy as np
N = 100_000
X = np.random.uniform(size=N, low=0, high=1)
Y = [2*x for x in X]
np.mean(Y) # 1.00221 <- not equal to 2/3
np.var(Y) # 0.3323 <- not equal to 1/18

What am I doing wrong here?

1

There are 1 best solutions below

0
On

Monte Carlo Simulation


To approximate the integral of some function of $x$, say, $g(x)$, over $S = [0, 1]$, using Monte Carlo simulation, you

  1. generate $N$ random numbers in $[0, 1]$ (i.e. draw from the uniform distribution $U[0, 1]$)
  2. calculate the arithmetic mean of $g(x_{i})$ over $i = 1$ to $i = N$ where $x_{i}$ is the $i$th random number: $$\frac{1}{N}\sum_{i = 1}^{N}g(x_{i})$$

The result of step 2 is the approximation of the integral.

Definitions of Expected Value and Variance


Given a continuous random variable $X$ with pdf $f(x)$ and set of possible values $S$,

  • The expected value of $X$ is the integral of $xf(x)$ over $S$: $$E(X) \equiv \int_{x \in S}xf(x)dx$$
  • The variance of $X$ is $$V(X) \equiv E(X^{2}) - E(X)^{2} = \int_{x \in S}x^{2}f(x) - E(X)^{2}$$

Approximating Expected Value and Variance using Monte Carlo Simulation


  • Expected value: to approximate the integral of $xf(x)$ over $S = [0, 1]$ (i.e. the expected value of $X$), set $g(x) = xf(x)$ and apply the method outlined above.
  • Variance: to approximate the integral of $x^{2}f(x)$ over $S = [0, 1]$ (i.e. the expected value of $X^{2}$), set $g(x) = x^{2}f(x)$ and apply the method outlined above. Subtract the result of this by the square of the estimate of the expected value of $X$ to obtain an estimate of the variance of $X$.

Code


Adapting your method:

import numpy as np
N = 100_000
X = np.random.uniform(size = N, low = 0, high = 1)

Y = [x * (2 * x) for x in X]
E = [(x * x) * (2 * x) for x in X]

# mean
print((a := np.mean(Y)))
# variance 
print(np.mean(E) - a * a) 

Output

0.6662016482614397
0.05554821798023696

Instead of making Y and E lists, a much better approach is

Y = X * (2 * X)
E = (X * X) * (2 * X)

Y, E in this case are numpy arrays. This approach is much more efficient. Try making N = 100_000_000 and compare the execution times of both methods. The second should be much faster.