What are the mean and stdev for a dataset that is approx. lognormally or Weibull-distributed?

51 Views Asked by At

Given are particle sizes and corresponding percentages of the total volume. Data:

    size    vol%
    0.594   0.03
    0.675   0.11
    0.872   0.25
    0.991   0.34
    1.28    0.55
    1.45    0.65
    1.88    0.89
    2.13    1.04
    2.75    1.44
    3.12    1.66
    4.03    2.16
    4.58    2.43
    5.92    3.03
    6.72    3.35
    8.68    4.05
    9.86    4.41
    12.7    5.09
    14.5    5.38
    18.7    5.7
    21.2    5.67
    27.4    5.15
    31.1    4.66
    40.1    3.38
    45.6    2.68
    58.9    1.42
    66.9    0.92
    86.4    0.18
    98.1    0.03

Plot using Excel:

enter image description here

Reading up on particle size distribution on WP, I found that this kind of data usually follow lognormal or Weibull-distributions. So I followed some YT tutorials on checking whether that is the case and arrived at this:

enter image description here enter image description here

So it's not perfect (2nd is Weibull) but I want to follow through with it if possible. However, when I extract the $\mu$ and $\sigma$ from the equation of the regression line, I don't get the original distribution (plot using WA):

enter image description here enter image description here

  1. Is my approach/are $\mu$ and $\sigma$ correct?

  2. Could I just take $\frac{1}{n}\sum_1^n x\cdot f(x)$ for $\mu$, and from that then calculate $\sigma$ as I would for a discrete distribution using $\sqrt{(\frac{1}{n}\sum_1^n (\mu-x)^2)}$?

1

There are 1 best solutions below

1
On

It's difficult to say whether you are doing the estimation correctly based on what you have provided. Lognormal model fitting is very sensitive to outliers. The usual approach for model fitting is to transform the data to the log scale, then fit a normal distribution to the transformed data. Then the values for $\mu$ and $\sigma$ are then used to calculate the lognormal mean and variance. If $Y = e^X$ is lognormal where $X$ is normal with mean $\mu$ and variance $\sigma^2$, then

$$\operatorname{E}[Y] = e^{\mu + \sigma^2/2}, \quad \operatorname{Var}[Y] = (e^{\sigma^2} - 1)e^{2 \mu + \sigma^2}.$$

The fit based on a small amount of data is likely to be very poor. Based on the regression plot you have included, I would say that the data is not lognormally distributed. If you include a Weibull plot, I would be able to see whether the Weibull fit is actually better.