Interpreting the differences of two log normal distributions:

106 Views Asked by At

I have read a couple of posts, and did not see the exact interpretation, I apologize in advance if this is not in the right location

Purpose: I am preparing a paper on the distribution of litter densities along the shore line in freshwater environnements. The data is collected by hand and the individual pieces are classified and counted by volunteers. These programs exist in many countries and there are fairly large data sets.

The units are expressed as 'pieces of trash/meter(or foot) of shoreline.

Assumptions:

  • The data is collected in the same manner
  • The volunteers have the same motivations
  • There is no (under counting or over counting)
  • Accuracy is basically the same across the spectrum
  • The math is correct

The graph below represents the graph of two sets of Data:

  • MCBP is regional results for Lake Geneva (Switzerland) n=100 samples
  • SLR results from the 'The Swiss Litter Report' n=365 samples The following code was used to calculate the distributions and present the graphs from a DataFrame in pandas/python 3.6:

    df['Density] = df['Total']/df['Length']
    df['Logs'] = df['Density'].apply(np.log)#<- skewed data(get it close to norm)
    mu, sigma = stats.norm.fit(df['Logs'])
    #repeat for df2 to get the second curve
    #Build histograms for the two data sets
    #plot the two disributions where x = df['Logs']
    #and y = stats.norm.pdf(x, loc=mu, scale=sigma)
    

The resulting two distributions

mu for the the SLR disribution is 0.1564617, which is equal to the 5th percentile of the MCBP distirbution.

I am interpreting this as meaning:

  • There is a 5% oprobability that a sample from MCBP will be less than the average from SLR.
  • There is a 95% probability that a sample taken from the MCBP region will be greater than the national average
    • In general I can expect litter densities to be greater in the MCBP region than in the SLR region

Is this interpretation correct? (It does correlate with observations)

Thanks for your help

1

There are 1 best solutions below

2
On BEST ANSWER

You interpretation is correct as long as your assumption that the data is log-normal distributed is correct and that your estimated parameters for those distributions are correct.

Now, the accuracy of your estimated parameters depends on the number of data points you have. You can do a t-test to answer your third point with a given level of confidence.