Log-Normal RVs: How much probable it is finding values near the mode than near the average?
Intro_______________
I was trying to got an insight into this question, and I think in the following: since it is not possible to speak about the probability of a simple point in a continuous probability distribution, because the CDF will integrate with the same number on both of its limits of integration, which equals zero, then maybe it is possible to compare both probabilities on a infinitesimal interval if the length is the same for both scenarios, $$\text{ratio} = \lim\limits_{q \to 0^+} \dfrac{P(\nu-q<X<\nu+q)}{P(\bar{x}-q<X<\bar{x}+q)}$$
where $\nu = e^{\mu-\sigma^2}$ is the mode and $\bar{x} = e^{\mu+\frac{\sigma^2}{2}}$ is the average of the Log-Normal Distribution: I am using the same description than Wikipedia in order to reduce explanations, and I am avoiding the use of expected value term in purpose, since for a skewed distribution the "average" don't coincide with the value you expect to see more frequently which is given by the "mode".
Now, for the Log-Normal Distribution I will have that: $$P(X<x) = \Phi\left(\frac{\ln(x)-\mu}{\sigma}\right)=\frac12\left[1+\text{erf}\left(\frac{\ln(x)-\mu}{\sigma\sqrt{2}}\right)\right]$$ and since $P(a<X<b) = P(X<b)-P(X<a)$, after some simplifications I will have that the ratio I am looking for will look like: $$\text{ratio} = \lim\limits_{q \to 0^+} \dfrac{\text{erf}\left(\dfrac{\ln\left(e^{\mu-\sigma^2}+q\right)-\mu}{\sigma\sqrt{2}}\right)-\text{erf}\left(\dfrac{\ln\left(e^{\mu-\sigma^2}-q\right)-\mu}{\sigma\sqrt{2}}\right)}{\text{erf}\left(\dfrac{\ln\left(e^{\mu+\frac{\sigma^2}{2}}+q\right)-\mu}{\sigma\sqrt{2}}\right)-\text{erf}\left(\dfrac{\ln\left(e^{\mu+\frac{\sigma^2}{2}}-q\right)-\mu}{\sigma\sqrt{2}}\right)}$$
Unfortunately it is way more complicated than what I can unravel by myself, and neither Wolfram-Alpha could got a solution, but I started to place numbers where $\mu$ and $\sigma$ are and I found some interesting results:
- If I chose $\mu = 1$ and $\sigma = 1$, instead of having an undefined ratio or a zero value (as I were afraid due what I mentioned at the beginning), the formula do find a finite limit value in wolfram-alpha showing $\text{ratio} = e^{9/8}$
- So I started to change values to see what happens: I notice that if I change $\sigma$ the ratio change values, but when I change $\mu$ for a fixed $\sigma$ the ratio doesn't change at all.
- If the variable $\mu$ show to be a dummy, then I think why not just making it zero, and with this wolfram-alpha shows an interesting result: it cannot solve the ratio but it gives a series expansion: $$\text{ratio}\approx \exp\left(\frac{3\sigma^4-\ln\left(e^{-\sigma^2}\right)^2+\ln\left(e^{\frac{\sigma^2}{2}}\right)^2}{2\sigma^2}\right) + O(q^2)$$ which can be reduced to $$\text{ratio}\approx e^{\frac{9\sigma^2}{8}} + O(q^2)$$ Here I also noticed that for different values of $\sigma$ the limits found by Wolfram-Alpha actually matches perfectly the values of the approximation, so I am believing that Wolfram-Alpha indeed solved the ratio formula when it did the series expansion.
- Following this intuition, I tried to expand the first ratio in its series expansion and Wolfram-Alpha found the following: $$\text{ratio}\approx \exp\left(\frac{3\sigma^4-\left(\mu-\ln\left(e^{\mu-\sigma^2}\right)\right)^2 +\left(\mu- \ln\left(e^{\mu+\frac{\sigma^2}{2}}\right)\right)^2}{2\sigma^2}\right) + O(q^2)$$ which can be reduced again to $$\text{ratio}\approx \exp\left(\dfrac{9\sigma^2}{8}\right) + O(q^2)$$ explaining why the variable $\mu$ was behaving as a dummy variable.
Main question________________
Given the results I found, I want to ask the following:
- It is possible to prove that for a Log-Normal distribution the ratio $$\text{ratio} = \lim\limits_{q \to 0^+} \dfrac{P(\nu-q<X<\nu+q)}{P(\bar{x}-q<X<\bar{x}+q)} \equiv \exp\left(\dfrac{9\sigma^2}{8}\right)$$ happen to be true?
- Is it correct to say that for a Log-Normal distributed random variable, the probability of finding a value near the "Mode" is $\exp\left(\dfrac{9\sigma^2}{8}\right)$ times higher than the probability of finding a value near their "Average"?
Your ratio is simply the ratio of probability densities, because for a continuous random variable $X$, $$f_X(x) = \lim_{\Delta x \to 0^+} \frac{\Pr[x < X \le x + \Delta x]}{\Delta x}.$$ That your limit uses the variant $\Pr[x-q < X < x+q]$ is immaterial.*
As such, your ratio has the value $$R = \frac{f_X(e^{\mu - \sigma^2})}{f_X(e^{\mu+\sigma^2/2})} = \frac{e^{-\mu + \sigma^2/2}}{\sqrt{2\pi} \sigma} \cdot \frac{\sqrt{2\pi} \sigma}{e^{-\mu - 5\sigma^2/8}} = e^{9\sigma^2/8}.$$
*The reason is because where the derivative is defined, $$f'(x) = \lim_{\Delta x \to 0} \frac{f(x+\Delta x) - f(x)}{\Delta x} = \lim_{\Delta x \to 0} \frac{f(x+\Delta x)-f(x-\Delta x)}{\Delta x}.$$
As for the meaning of this ratio, it is a likelihood ratio. Specifically, it is the likelihood ratio of observing the mode versus the mean. It does not represent the idea that the probability of observing (approximately) the mode is $e^{9\sigma^2/8}$ more than the mean, because the density is not a probability. For instance, if I said event $A$ had a probability of $\Pr[A] = 0.01$, and event $B$ is $e^{9\sigma^2/8}$ times more likely than $A$, that is, $\Pr[B] = e^{9\sigma^2/8}\Pr[A]$, then that would seem to suggest that for sufficiently large $\sigma$, $\Pr[B] > 1$, which is absurd.
To illustrate the issues arising from confusing a likelihood for a probability, consider the following example. Suppose $\mu = -1$ and $\sigma = 1$. Then it is not difficult to see that the lognormal mode is $\nu = e^{-2}$ and the lognormal mean is $\bar x = e^{-1/2}$. The values of the density at these points are $$f_X(\nu) = \frac{e^{3/2}}{\sqrt{2\pi}} \approx 1.78794, \\ f_X(\bar x) = \frac{e^{3/8}}{\sqrt{2\pi}} \approx 0.580458. $$ And while it is true that their ratio is exactly $e^{9/8}$, the values of these densities are not probabilities. To rectify this, we might be inspired to let $$p(x,\delta) = \Pr[|X - x| \le \delta],$$ for some suitably small $\delta > 0$, and consider the ratio $p(\nu, \delta)/p(\bar x, \delta)$. Let's try this for $\delta = 0.01$:
$$\frac{p(e^{-2}, 0.01)}{p(e^{-1/2}, 0.01)} \approx \frac{0.0357261}{0.0116106} = 3.07702.$$ This is pretty close, but not exactly equal to $e^{9/8}$. But what if we had chosen $\mu = -1$, $\sigma = 2$? Then $$\nu = e^{-5}, \quad \bar x = e$$ and for the same $\delta$, we get $$\frac{p(e^{-5}, 0.01)}{p(e, 0.01)} \approx \frac{0.0611685}{0.000890168} = 68.7157$$ and this is not at all close to the theoretical result of $e^{36/8} = 90.0171$. What happened? The problem here is that the range of $\delta$ for which this ratio becomes "good" is dependent on $\mu$ and $\sigma$. Yes, if it's "small enough," then it must work. But as we make $\delta$ smaller, the probabilities of being "close to" $\nu$ or $\bar x$ also get vanishingly small. This is what you were doing when you wrote $O(q^2)$ in your work. You were trying to quantify the magnitude of error in your approximation of the likelihood ratio as a function of the half-width of the interval.
So it's not as if the ratio doesn't have some kind of interpretation in terms of relative probabilities, but care must be taken when we talk about such ideas.