Infer the variance of a distribution

54 Views Asked by At

Assume that the distribution of firms' income is log-normal, and that we know the sum of the all the revenues and the number of firms in the market (so the mean of the revenue distribution is known).

Unfortunately, we are unable to observe any individual firm's revenue, but we know how many firms within a certain range of revenue. The bar plot is this: Firm's Revenue - Bar Plot

The following is the table showing how much firms in each bin of revenue:

     Revenue              Number of Firms
    $0      - 999     |      480 
$1,000  -   1,999 |     2000
    $2,000  -   3,999 |     1600
$4,000  -   5,999 |     1200
    $6,000  -   7,999 |      800
$8,000  -   9,999 |      680
    $10,000 - 14,999  |      300
$15,000 +         |      120

Is there any way that we can infer the variance of the firm revenue distribution based on this information? Thank you very much!

1

There are 1 best solutions below

0
On

You know it is log normal and you know the mean so you are just need to know the parameter $\sigma$. When you know this you can calculate the variance. See the formula in the Wikipedia page https://en.wikipedia.org/wiki/Log-normal_distribution.

The below method is a quick and easy way of estimating $\sigma$ using a spreadsheet.

A histogram is just an approximation to the probability density function. If you knew $\sigma$ you could reproduce the histogram as each bar is given by integral of the PDF over the size of the histogram bar. This integral can be calculated using the cumulative density function (see Wiki page again). Essentially Bar(a to b)$=CDF(b)-CDF(a)$. Excel can compute $erf(x)$. See https://support.office.com/en-us/article/ERF-function-C53C7E7B-5482-4B6C-883E-56DF3C9AF349.

[Edit: I had forgotten that the mean of a log normal depends on $\sigma$. That is OK, you just need to set the parameter $\mu=\log{(ObservedMean)}-\frac{\sigma^2}{2}$]

A way of estimating $\sigma$ is to do a sort of regression fit. First take a guess for $\sigma$ and calculate the histogram it gives you as above. Now take the difference between each calculated bar and the actual bar and square it. Add this number up for all the bars.

This is a measure of the error between your estimated histogram and the actual one. You want to minimise this which you can do by changing the value of $\sigma$. You can use either Excel's Goal Seek or Excel's Solver to do this for you.

You will now have a value of $\sigma$ in which case the two histograms should look more of less the same. If they don't then either your mean is wrong or the distribution is not log normal.