Plot of ideal PDF not close to histogram?

57 Views Asked by At

Given $X \sim N(0, 1)$ and $Y = e^X$ I have calculated the PDF of $Y$ to be $f_Y(y) = \frac{1}{\sqrt{2\pi}}y^{-\frac{3}{2}\ln(y)}$. But when I plot it against the values I get in practice from a random number generator the curves don't look similar. What am I doing wrong?

Derivation of the PDF:

It is known that if $Y = r(X)$ where $r$ is strictly monotone increasing that $r$ has an inverse $s$ and that:

$$f_Y(y) = f_X(s(y))\frac{ds(y)}{dy}$$

Separately for standard normal distributions we know that:

$$f_X(x) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2}$$

in this case $s(y) = \ln(y)$

$$f_Y(y) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}(\ln(y))^2}\frac{1}{y}$$ $$f_Y(y) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}\ln(y)\ln(y)}\frac{1}{y}$$ $$f_Y(y) = \frac{1}{\sqrt{2\pi}}(e^{\ln(y)})^{-\frac{1}{2}\ln(y)}\frac{1}{y}$$ $$f_Y(y) = \frac{1}{\sqrt{2\pi}}y^{-\frac{1}{2}\ln(y)}\frac{1}{y}$$ $$f_Y(y) = \frac{1}{\sqrt{2\pi}}y^{-\frac{1}{2}\ln(y)}y^{-1}$$ $$f_Y(y) = \frac{1}{\sqrt{2\pi}}y^{-\frac{3}{2}\ln(y)}$$

enter image description here

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import random
import matplotlib
import math
matplotlib.use("TkAgg") # without this nothing appears?!
import matplotlib.pyplot as plt
import numpy as np
import sys


def pdf_ch2e13(y):
    return (1/math.sqrt(2 * math.pi) * (y**(-3 * math.log(y, math.e) / 2)))

def pdf_ch2e13_hist():
    x = [np.random.normal() for i in range(100000)]
    y = [math.e**i for i in x]

    fig = plt.figure()
    ax2 = fig.add_subplot(111)

    r = np.linspace(0.0001, 10)
    hist, bins = np.histogram(y, list(r) + [float("inf")], density=True)
    ax2.plot(r, [pdf_ch2e13(x) for x in r], color="red", lw=3)
    ax2.plot(r, list(hist), color="blue", lw=3)
    ax2.set_ylabel('e^(random_normal)')

    plt.show()


pdf_ch2e13_hist()
2

There are 2 best solutions below

0
On BEST ANSWER

I made a simple algebra mistake in the last step of the PDF derivation:

$$f_Y(y) = \frac{1}{\sqrt{2\pi}}y^{-\frac{1}{2}\ln(y)}y^{-1} \neq \frac{1}{\sqrt{2\pi}}y^{-\frac{3}{2}\ln(y)}$$

Instead I should have done:

$$f_Y(y) = \frac{1}{\sqrt{2\pi}}y^{-\frac{1}{2}\ln(y)}y^{-1} = \frac{1}{\sqrt{2\pi}}y^{-\frac{1}{2}\ln(y) - 1}$$

2
On

According to Wikipedia, the pdf of the lognormal random variable $Y$ (with parameters $\mu = 0$ and $\sigma = 1)$ is $$f_y(y) = \frac{1}{y\sqrt{2\pi}}e^{-\frac 1 2(\ln(y))^2},$$ for $y > 0.$

In the figure below, the solid line uses the lognormal density function dlnorm from R and the heavy dotted line is based on algebraic code for the expression above. They match.

curve(dlnorm(x, 0, 1), 0, 10, ylab="Density", n=10001, xlab="y",
   main="Standard Lognormal Density")
abline(v=0, col="green3"); abline(h=0, col="green3")
curve((1/(x*sqrt(2*pi))*exp(-.5*(log(x))^2)), add=T, lwd=3, 
   n=10001, col="blue", lty="dotted")  # 'curve' requires argument 'x'

enter image description here

Starting from that and leaving it to others to correct the errors in draft derivations, we have the following demonstration based on sampling. (@StubbornAtom has it right.)

We use R to sample a million observations from $X \sim \mathsf{Norm}(0,1),$ find corresponding lognormal values $Y = e^X,$ make a histogram of them, and plot the lognormal density for comparison. [Because the lognormal distribution has such a long tail to the right, it is necessary to truncate $Y$-values at $20$ in order to get a useful illustration. (This includes 99.86% of the million observations.) The lognormal density is very slightly inflated to account for truncation.]

x = rnorm(10^6, 0, 1);  y = exp(x);  yt = y[y <= 20]
hist(yt, prob=T, br=100, col="skyblue2", main="Histogram of Truncated Lognormal Sample")
curve(dlnorm(x)/plnorm(20), add=T, n=10001, col="red")

enter image description here