Figuring out distribution from adding smaller distributions?

51 Views Asked by At

Suppose sam wants to know how long it usually takes him to get to work. He wants to know the 50th percentile, 90th percentile, and 99th percentile of how long, in minutes, it takes him to get to work.

Sam's route to work is split up into N segments. For each segment, the time it takes him to traverse that segment is drawn from some distribution over minutes.

Suppose I know the distribution of each segment. That is, for each segment, I know how long it takes to traverse that segment at the 50th percentile, the 90th percentile, the 99th percentile, etc.

How can I figure out the distribution of it takes Sam to get to work from knowing the distributions of the segments?

(Sorry if something doesn't make sense -- edits are welcome. For software engineers: I'm actually trying to figure out how to estimate the latency of a service call composed of several other service calls)

1

There are 1 best solutions below

0
On

Comment continued: Here are three specific examples to illustrate see that finding the exact quantiles of sums of distributions is not trivial.

Ten normal segments. Suppose each of ten independent segments is distributed $\mathsf{Norm}(\mu = 10, \sigma=1).$ Then quantiles .5, .9 and .99 for each segment are about 10, 11.3, and 12.3, respectively. (Computations in R.)

qnorm(c(.5,.9,.99), 10, 1)
10.00000 11.28155 12.32635

The sum of ten such segments is distributed $\mathsf{Norm}(\mu=100, \sigma=\sqrt{10}).$ The corresponding quantiles of this distribution are about 100, 104, and 107.$

qnorm(c(.5,.9,.99), 100, sqrt(10))
100.0000 104.0526 107.3566

Ten exponential segments. Suppose each of ten independent segments is distributed $\mathsf{Exp}(\text{rate} = .1).$ Then quantiles .5, .9 and .99 for each segment are about 6.9, 23, and 46, respectively. (The mean and standard deviation are both $10.)$

qexp(c(.5,.9,.99), .1)
 6.931472 23.025851 46.051702

The sum of ten such segments is distributed $\mathsf{Gamma}(\text{shape}=10, \text{rate}=.1.)$ (The mean is 100 and the variance is 1000.) The corresponding quantiles of the sum are about 97, 142, and 188.

qgamma(c(.5,.9,.99), 10, .1)
96.68715 142.05990 187.83117

Similar approximate results from a simulation:

set.seed(118)
x = replicate(10^6, sum(rexp(10,.1)))  # vector or a million sums of ten
mean(x); var(x); quantile(x, c(.5,.9,.99))
99.97961
1000.581
      50%       90%       99% 
 96.67126 142.06101 188.03925 

Sum of a dozen uniform segments. Suppose each of ten independent segments is distributed $\mathsf{Unif}(0,1).$ Then quantiles .5, .9 and .99 for each segment are .5, .9, and .99, respectively. (The mean is 1/2 and the variance is 1/12.)

qunif(c(.5,.9,.99))
0.50 0.90 0.99

According to the Central Limit Theorem, the sum of 12 such segments is distributed nearly as $\mathsf{Norm}(\mu=6, \sigma=1)$ From simulation, the approximate corresponding quantiles of the sum are about 6, 7.3, and 8.3.

set.seed(2019)
x = replicate(10^6, sum(runif(12)))
mean(x); var(x); quantile(x, c(.5,.9,.99))
6.001354
1.000741
     50%      90%      99% 
6.002158 7.289854 8.310085 

Note: In general, if the segments are independent and identically distribututed with known mean and variance, and there are enough of them that the Central Limit Theorem applies, then you might find the mean and variance of the nearly normal sum, and from them the desired quantiles.