What single estimator of a 5000 series of 20 elements for a non-normal distribution should be used?

23 Views Asked by At

The data are traffic counts of one minute (20 different days) of 5000 streets. The problem is that the mean of the 20x5000 observations does not explain much as it is not a normal distribution. Also, there is high variability in the street itself, due to the issues such as traffic jams, traffic lights, accidents, etc., there are big variations, even a lot of zeros. Probably a smooth maximum? I lack of deeper knowledge, but sure that there is a mathematical approach that provides a (one!) metric that can better represent the “intensity” or “flow” of that city. Thank you!

1

There are 1 best solutions below

2
On BEST ANSWER

Just because the data is non-normal, doesn't mean that the mean isn't potentially a useful value. If you data are from a simple random sample of the full data (or of the underlying model), then the sample mean is still an unbiased estimator of the population (or model) mean.

However, it sounds like your problem is more trying to understand what property you even want to estimate, which is a conceptual one. Traffic is a flow, so static counts might not even be the best thing to measure - this is a problem in real-life statistics all the time, where you have to determine whether the data you have actually provides any information about the thing you're actually looking to describe. Sometimes, the answer is no, and you can only answer another, related question.

My first suggestion is to get in contact with someone whose expertise is related to measuring traffic - e.g. local road authorities - and see if they have any information on what they tend to measure, or design roads and signals based on. From the looks of it, traffic flow is a pretty dense theoretical field and there are a lot of factors other than just "number of cars at an intersection", so you may have to heavily cut down your expectations.

My second suggestion is to consider something relating to how long traffic stays at its peak capacity, or how long it takes to recover from being at peak. There might even be a nice maximum-likelihood estimator somewhat in the vein of the German tank problem (although you can't apply that one directly since you don't have unique sequential identifiers for the vehicles).