Probability of sampling a cluster of a certain size in a log normal distribution

37 Views Asked by At

I have a set of clusters of 'objects' that is log-normally distributed. If the total population size is known, and the mean and stdev of the log of the cluster sizes, is it possible to estimate the minimum cluster size in the population that would be represented by a cluster of a certain size in a sample? The clusters are collections of objects that have a certain degree of similarity (not sure that is important).

i.e. what would be the probability of a cluster of size 1000 being represented by at least 100 objects for a given sample size?

Thanks for any help, my knowledge of probability is not good enough for this, but I am willing to learn with hints.

Thanks,

S.

1

There are 1 best solutions below

0
On

Ok, I worked it out - posting this for anyone else that might need it. The distribution of the data is actually irrelevant, its all just binomial sampling probability.. there is even a function in Excel: BINOMDIST.RANGE. I won't repeat the formulas here, better to look up that function.

In the function,

P = cluster size / population size = 1000 / 5000000 (5 million as example)

S = 100

s2 = number of samples