Count distributions for non-exponential inter-arrivals

245 Views Asked by At

The Poisson process is nice and clean. The inter-arrival times are exponentially distributed with rate $\lambda$ and the counts of the number of events in a given interval of time, $t$ are Poisson with mean $\lambda t$. For a Poisson process, the mean and variance are the same which is rather restrictive. In practice, we often encounter count distributions where the variance is higher or lower than the mean. So to experiment with such distributions, I relaxed the requirement that inter-arrival times follow exponential (meaning constant hazard rate). I imagined the inter-arrival time following the Weibull distribution, since it can model monotonically increasing and decreasing hazard rates (when its scale parameter, $\kappa$ is $>1$ or $<1$ respectively).

Now, I used a simulation to get the count of the number of events in a given interval of time when the inter-arrival is Weibull. For the Poisson process, the average number of events is simply the interval of time divided by the mean of the inter-arrival distribution (which is exponential), which is $\frac{t}{\lambda^{-1}} = \lambda t$

Increasing hazard rate When I pick $\kappa>1$ (meaning the hazard rate is increasing instead of constant):

  • The variance is higher than the mean.
  • The average number of events in any interval of time is greater than the interval length divided by mean of the inter-arrival distribution.

Decreasing hazard rate When I pick $\kappa<1$ (meaning the hazard rate is decreasing instead of constant):

  • The variance is lower than the mean.
  • The average number of events in any interval of time is lesser than the interval length divided by the mean of the inter-arrival distribution.

The figure below plots the expected number of events in a given time interval (based on dividing the interval size by the mean of the inter-arrival Weibull) and the actual average number of events with the shape parameter, $\kappa$.

Is there an intuitive explanation for why we might observe this? As a bonus, is it possible to derive the CDF or PDF of this count distribution?

enter image description here

1

There are 1 best solutions below

0
On

I want to address the observation that an increasing hazard rate corresponds to an under-dispersed point process (variance less than mean) while a decreasing hazard rate corresponds to an over-dispersed point process (variance greater than mean). First, let's consider the case of the constant hazard rate, which is the Poisson point process. Take any interval of time and divide it into many smaller sub-intervals. Let's say the number of sub-intervals is $n$ and the length of each interval is $\delta t$. As $\delta t \to 0$, $n \to \infty$. For these small intervals, it is highly unlikely that there will be two events. There will be either one or events within each interval, making the presence or absence of the events Bernoulli random variables. Let's say the probability of the presence of the event in any interval is $p$. Then, the average number of events in the entire interval becomes $E(X) = n p$. Also, the variance becomes: $V(X)=np-np^2$. Now, we know that $p \to 0$, meaning $p^2$ is negligible compared to $p$. So in the limit, the variance and mean both become $np$.

Let's define the hazard rate. If $T$ is the distribution of the inter-arrival time, then the Bernoulli probability of an event lying in the interval right after time $t$ is:

$$\delta p_t = \lim_{\delta t \to 0}P(T \in (t,t+\delta t) | T>t)$$ The hazard rate is then:

$$h(t) = \frac{\delta p_t}{\delta t}$$

Now, let's consider the hazard rate being an increasing function of time. If $\delta p_t$ is high, then it means that the previous event must have happened a long time ago. So, the Beroulli's in the immediate vicinity probably didn't involve an event occurring, making $\delta p_{t-t_1}$ low (for small values of $t_1$). Also, because $\delta p_t$ is high, it's likely an event will happen in the small interval following $t$. This will reset the process and make it start from a low hazard rate. So, $\delta p_{t+t_1}$ will be low as well. In other words, the Bernoulli corresponding to the current timestamp is negatively correlated with those in its vicinity. This means that the variance of the sum will be lower than the sum of the variances because of the negative correlation terms. This will make the count distribution under-dispersed.

Now, consider the case of decreasing hazard rate. If $\delta p_t$ is low, it means that $t$ must be high, giving the hazard rate a long time to decay. This means that $p_{t-t_1}$ must be low as well. Also, since $\delta p_t$ is low, it is unlikely an event will happen in the small interval following $t$. This means that the hazard rate will likely keep decaying and $\delta p_{t+t_1}$ will remain low as well. This indicates that $p_{t}$ is positively correlated with the Bernoulli variables in its vicinity. The positive correlation will mean that the variance is greater than the mean, making it an over-dispersed point process.