I having some trouble understanding how mean and variance are be used in sampling rates and temporal rates (per unit of time).
Example
If I eg. wanted to evaluate car crashes per hours driven I would like to know how evaluate the expected crashes per hours driven and how dispersed the data is. For simplicity I'm assuming event independence.
Most of the observations will have 0 crashes so it is relevant to use the rate of crashes. On a population of drivers the rate can be calculated.
$ X_i = \text{crashes},t_i = \text{hours driven} $
$ \lambda_{pop} = E(X) = VAR(X) = \frac{\sum{X_i}}{\sum{t_i}} $
The amount of crashes in a certain amount of time driven would then reasonably follow a Poission process. However, the Poisson process lacks the insight on how crash rates are distributed across the population.
Sampling
Therefore my thought is to randomly sample the population into groups and with the central limit theorem represent the distribution as a normal distribution on the population.
I think i understand sampling and ratios but struggling on the detail on how/if I should use them them in conjunction.
General Questions
Since I'm evaluating a ratio. Should i use the Harmonic mean for a more representative average on population/samples?
Should i use total hours driven or sample size as sampling weights?
Does the central limit theorem hold/fail with my assumptions?
In conclusion
I'm looking for similar problems to get inspiration on my problem and sanity checking my assumptions.