average proportions over time

173 Views Asked by At

Let us say I have a number of events per week over some time periods. During each week a proportion of these events are important (important_events_per_week/number_of_events_per_week). The number_of_events_per_week also vary over time. How does one calculate the average important number of events per week over time please? I would think it is too naive to simply calculate the proportion per week and than take the arithmetic mean, so I thought I ask proper mathematicians? Should one use a weighted average to account for the varying number of events? Thanks!

2

There are 2 best solutions below

1
On BEST ANSWER

If you want the average proportion of the number of interesting events to the number of total events over some period of multiple weeks, one simple method is to add up the total number of events during that period (which you can do because you have the total number of events in each week of the period), then add up the total number of interesting events during that period (which you can do because you have the total number of interesting events in each week of the period). Finally, divide the total number of interesting events by the total number of events.

This is how many such averages are computed in real life. For example, to find the average speed for a trip of $20$ miles, you don't take the speed on each mile and do some kind of fancy weighted average of those $20$ speeds to find the average speed, you just take the total trip time and divide by the total distance (which is $20$ miles).

0
On

Maybe the number $X$ of events per week has $X \sim \mathsf{Pois}(\lambda = 10),$ so that on average there are ten events a week. If $1/5$ of these events are "important," then the number $Y$ of important events per week has $Y \sim \mathsf{Pois}(\lambda_I = \lambda/5 = 2),$ so that there are two important events in a week on average.

Then $P(Y = k) = e^{-2}2^k/k!$, for $k = 0, 1, 2, \dots.$

In particular $P(Y = 5) = 0.0361$ and $P(Y \le 3) = 0.8571,$ (rounded to four places) can be found from the PDF formula above or in R, as follows:

dpois(5, 2)
[1] 0.03608941
exp(-2)*2^5/factorial(5)
[1] 0.03608941

sum(dpois(0:3, 2))
[1] 0.8571235
ppois(3, 2)
[1] 0.8571235

In R a Poisson PDF is denoted dpois and a Poisson CDF is denoted ppois, each with appropriate parameters.

If you don't know the average number of important events per week, you might look at numbers of important events over the last 52 weeks.

y
 [1] 2 3 3 2 3 2 0 2 0 4
[11] 0 1 2 1 3 1 2 3 2 1
[21] 2 2 3 1 4 0 1 4 2 3
[31] 3 0 2 1 1 1 1 3 1 1
[41] 1 3 2 1 3 1 0 2 2 1
[51] 2 2

summary(y)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   1.000   2.000   1.788   3.000   4.000 

So a reasonable estimate of $\lambda_i$ as $\hat \lambda_I = 1.788.$

Here is one way to find a 95% confidence interval (CI) $(1.46, 2.19)$ for $\lambda_I,$ based on the total number of important events in a year:

sum(y)
[1] 93
CI.52 = 93 + 2 + qnorm(c(.025,.975))*sqrt(93+1)
CI.52/52
[1] 1.461489 2.192357

That's $(93+2 \pm 1.96\sqrt{93+1})/52.$

Note: My 'data' above were sampled from $\mathsf{Pois}(2)$ as shown below. In a real application you wouldn't know the exact true value of $\lambda_I.$

set.seed(315)
y = rpois(52, 2)