prediction of independent events

42 Views Asked by At

I am interested to know if a model can be built to predict the occurrence of an event, in this case, I want to predict 911 calls. While each event alone is independent, the rate and general geographical area for these events is only somewhat variable when you look at neighborhoods, certain roadways, time of day, etc. So then if I had historical data for a long period (say 5 years), would it be possible to form a general prediction of where (neighborhood level specificity) and when (roughly) a 911 call would be expected to occur? The data set includes approximately 200,000 events over 5 years and there are approximately 40 event types. For example, if 911 is called for an unresponsive person, this category makes up 8% of responses. I know this is a complicated problem but any light that can be shed on it would be greatly appreciated.

1

There are 1 best solutions below

2
On

One may break the problem into two parts: First choose a model. Then find the parameters of the model, given the data.

Let's first choose a model. In this case, one may want to list down the variables. Based on your question they seem to be:

  • event type: (X)
  • location: (Y)
  • time of day: (T) [discretized into buckets]

Together, these indicate the occurence of some event. If we model X and Y to be dependent on the time of day, one may find $\mathbb{P}[X=x_i,Y=y_j\ |\ T=t_k]$. The simple model (which is likely wrong, but is simple and easier to estimate) is to assume X is independent of Y, given T. In this simplistic case, $$\mathbb{P}[X=x_i,Y=y_j\ |\ T=t_k]= \mathbb{P}[X=x_i\ |\ T=t_k]\cdot \mathbb{P}[Y=y_j\ |\ T=t_k]$$

One can construct a more nuanced joint density function by building a causal graph.

Next you may want to choose a model for 911 calls, given time of day. One possible model for 911 calls is an exponential distribution for interarrival times, with poisson distribution for rate of calls per time period. You may assume given time of day a certain call rate. ie. you could estimate a $\lambda_{T=t_k}$

These calls could be of an event type $X=x_i$ and from location $Y=y_j$, which can be estimated from $\mathbb{P}[X=x_i,Y=y_j\ |\ T=t_k]$

This is one possible model for your problem. Now all that is remaining is to estimate the parameters of the model. You may want to use a maximum likelihood estimator to find various parameters of this model. For isntance if the set of all parameters is $\theta\in\Theta$, then, $$\theta^* = argmax_{\theta \in\Theta} \bigg[\mathbb{P}[Data | \theta]\bigg]$$


Once you have the parameters of the model, you can predict (depending on the time of day) when the next call may arrive, and further, the probability distribution of where the call is from, and what the call is about.