Combining confidence intervals for sums of generic random variables

367 Views Asked by At

So my fiancee is a civil servant and asked me for help with the following problem.

She has been given a collection of upper and lower bounds on expenditure for a collection of projects like:

Project A: 2 million- 70 million

Project B: 55 million to 60 million and so on...

She wants to provide a suitable interval for probable total expenditure.

Now the obvious technique is to just add up the upper bounds and add up the lower bounds, but a lot of the ranges are huge and there's over 300 projects- so this gives something like:

65 million-700 million

This is obviously rubbish in the sense that both of the extremes are incredibly unlikely, and the upper bound differs from the lower by a factor of 10. Her colleagues have variously suggested narrowing the bounds in cases where they are less likely (which strikes me as very bad for birthday paradox type reasons), and to do something my fiancee remembers as being called 'Optimisation Bias', which google has never heard of.

After some thought, it struck me that modelling these as gaussians, you can translate pretty easily from confidence interval to standard deviation and then back after some playing, that is:

If $r_i:=\max(X_i)-\min(X_i)$

And we pretend $r_i=Z_i\sigma_i$ then

$R:=\sqrt{\Sigma r_i^2}=\sqrt{\Sigma Z_i^2 \sigma_i^2} \geq \min_i Z_i \sigma$ (where $\sigma$ is the s.d. of the sum)

So you get a confidence interval with at least the confidence level of at least that of the 'least confident variable', which you can put either side of your expected value to get a much reduced interval.

But when you don't assume normality, you need to start playing with Chebychev-type inequalities to translate between confidence and s.d., and the sheer variety of these makes my head hurt, and after some mucking around with the tamer ones trying to get a theoretical bound on the confidence level of the resulting interval, I've decided it's time to ask the internet.

So my questions:

First, this seems like a common problem, is there a current consensus on best practice here, possibly using a totally different approach? ('Optimisation Bias' perhaps?!?!?)

Second, and this is just out of interest for me really, if one prats about with inequalities for long enough, does a sensible bound for confidence emerge for the 'square-sum-square-root interval' I've concocted if the distribution is generic? What about if we say bounded-positive? Unimodal?