What am I reinventing? RE: Linear regression modeling for frequency of discrete events

116 Views Asked by At

I'm looking to model the frequency of events to quantify how much that frequency is increasing or decreasing. For the sake of concreteness think of the events as web page hits for several low traffic web sites, and I would like to compare how much they are "trending" up or down relative to one another. My main question is what am I reinventing?

In broad strokes this is the what I'm thinking. I have a set of events at a set of times $E = \{t_1,...t_N\}$ with $t_i < 0$. From these I have an event distribution function which is the sum of Dirac delta functions. $$ \Phi(t) = K\sum_{i=1}^N\delta(t-t_i) $$ where K is some normalizing factor. I would like to model this like a linear regression with $L(t) = at + b$ by minimizing $ \Vert{L-\Phi}\Vert$. Older events should be weighted less, so my inner product measure would be something like: $$ d\omega = e^{kt}dt $$ Before I started digging out the details of this (the normalization, the inner product measure, etc.) I got the nagging suspicion that this has been done before :) So my question is -- where can I read about the established theory, practice, and terminology for this type of problem? Any suggestions about how to either rephrase the title of this question or reformulate the problem are appreciated as well.

1

There are 1 best solutions below

2
On

I think you are just looking for the weighted least squares method.

In the wiki article, a discrete version is tackled but I think it's rather easy to adapt this to the continuous case. What you want is to minimise

$$\int_{-\infty}^{0}\left(L(t)-\Phi(t)\right)^2 e^{kt}dt = \int_{-\infty}^{0}\left(at+b-\Phi(t)\right)^2 e^{kt}dt$$

As a first step, you can look for critical points by differentiating w.r.t. $a$ and $b$ and equating to zero. This gives the following set of equations

$$\begin{eqnarray} a \int_{-\infty}^{0}t^2 e^{kt}dt + b \int_{-\infty}^{0}t e^{kt}dt & = & \int_{-\infty}^{0}\Phi(t) t e^{kt}dt \\ a \int_{-\infty}^{0}t e^{kt}dt + b \int_{-\infty}^{0}e^{kt}dt & = & \int_{-\infty}^{0}\Phi(t) e^{kt}dt \end{eqnarray}$$

The integrals with $\Phi(t)=K\sum_{i=1}^N \delta(t-t_i)$ will reduce to sums while the other integrals can be worked out easily (they are just values of the gamma function or factorial).

$$\begin{eqnarray} -\frac{2}{k^3}a + \frac{1}{k^2}b & = & K\sum_{i=1}^N t_i e^{kt_i} \\ \frac{1}{k^2}a - \frac{2}{k} b & = & K\sum_{i=1}^N e^{kt_i} \end{eqnarray}$$

This linear system can be easily solved to give explicit formulas for $a$ and $b$.

EDIT: I just thought about something and I think it's not logical or natural to make a linear fit of $\Phi(t)$. One should try to fit the cumulative funtion i.e. $C(t)=\int_{-\infty}^t \Phi(u)du$ as this gives the count of events. But since I am not 100% sure what you want to achieve, I leave this as an additional comment. It's easy to adapt the above formulas for $C(t)$.