Slow convergence simulating log-normal sample from the normal

826 Views Asked by At

I am trying to simulate a log-normal random variable $Y$ with mean $m = \mathbb{E}[Y] = 0.001$ and standard deviation $s = 0.094$ by simulating a normal sample instead, and then exponentiating it.

From definition of log-normal distribution, if we let $X = \ln Y$ then $X \sim \mathcal{N}(\mu, \sigma^2)$, where $$ \sigma^2 = \ln \left(\frac{s^2}{m^2}+1\right) \text{ and } \mu = \ln(m) - \frac{\sigma^2}{2}, $$ which in my case seems to imply $$ \mu \approx -11.4511, \sigma \approx 3.0144. $$

So I simulate a large normally distributed sample (1 mln points), which comes out with sample mean and standard deviation of approximately $-11.448$ and $3.01435$, which is very nice.

Then, I exponentiate every entry in the sample, expecting to get a log-normal sample with my original parameters. The mean and standard deviation of this exponentiated sample are $0.0010199$ and $0.055741$. So the mean comes in very close, but the standard deviation is way off the mark.

I tried multiple experiments, and it seems the mean is matched, but the standard deviation is not. I implemented both in Excel and in Python (happy to post the code if needed) but I don't understand why I cannot match the standard deviation in the log-normal sample.

It could be it requires more points to converge, but 1 mln points seems like a very large number to at least get 2-3 digits after the decimal point correctly?

I would appreciate any ideas. Thank you very much.

2

There are 2 best solutions below

1
On BEST ANSWER

I think the particular values of $m$ and $s$ you are using give rise to unexpected computational difficulties. Perhaps it is just too much to wish for two-place accuracy retrieving $s$ with a million simulated values. Also, taking the ordinary SD of lognormal data may not be the optimal way to estimate parameter $s$ (especially when it is small). I'm not absolutely sure of any of this because I have not taken the time to track exactly where it goes wrong.

However, if $m = .1$ and $s = .4$, your method seems to work just fine. Here is a brief session in R:

 m = .1;  s = .4
 sg = sqrt(log(1 + s^2/m^2));  sg
 ## 1.683215
 mu = log(m) - .5*sg^2;  mu
 ## -3.719192
 x = rnorm(10^6, mu, sg);  y = exp(x)
 mean(x);  sd(x)
 ## -3.720081
 ## 1.682534
 mean(y);  sd(y)
 ## 0.09942132
 ## 0.4097221

Searching the internet for 'lognormal distribution', I found a MathWorks page that says (in your notation) that a lognormal distribution with mean $m$ and variance $s^2$ has $\mu = \ln(m^2/\sqrt{s^2 + m^2})$ and $\sigma = \sqrt{\ln(1+ s^s/m^2)}.$ Maybe it is my imagination, but for various values of $m$ and $s$ these computations seemed a little more stable getting $\mu$ and $\sigma$. (But that still does not cure your problem.)

 m = .1;  s = .2
 mu = log(m^2/sqrt(s^2 + m^2)); mu
 ## -3.107304
 sg = sqrt(log(1+s^2/m^2)); sg
 ## 1.268636
 x = rnorm(10^6, mu, sg)
 mean(x);  sd(x)
 ## -3.107111
 ## 1.269737
 y = exp(x)
 mean(y); sd(y)
 ## 0.1000590
 ## 0.1992975

R generates lognormal samples directly with the function rlnorm. Here are some related results.

 y = rlnorm(10^6, -11.4511, 3.0144);  mean(y);  sd(y)  # Your values
 ## 0.0009537684
 ## 0.03262853    # not accurate to two places
 y = rlnorm(10^6, -3.107304, 1.268636); mean(y); sd(y) # m=.1, s=.2
 ## 0.1003433
 ## 0.2027905
3
On

The problem is the small mean (m=0.001). The following R code generates samples of size n from a lognormal with mean m and standard deviation n:

rlnorm2 <- function(n, m, s) {

  # samples from lognormal with m1=1, s1=1:

  sig1 = sqrt(log(2))
  mu1  = - 1/2*log(2)

  y1 <- rlnorm(n, mu1, sig1)

  # then transforms the sample into a lognormal with m2=m, s2=2:

  sig2 = sqrt(log(1+s^2/m^2))
  mu2  = log(m) - 1/2*sig2^2

  y2 <- exp(mu2)*(exp(-mu1)*y1)^(sig2/sig1)

  y2
}