Estimating conditional probability as a function of time

815 Views Asked by At

My question relates to estimating from a time series a time dependent conditional probability without having a prior parametric model of anything.

Suppose I have two variables: r and I, and each can take one of $m$ number of discreet values. I take a measurement every $\tau_1$ seconds. For the n'th measurements we know that time is $t_n$ and we measure values ($r_n$, $I_n$). $r_n$ is drawn according to some unknown probability distribution P(r| {some parameters}; $t_n$ ) and $I_n$ is drawn from some other unknown probability distribution P(I | {some parameters}; $t_n$). What I mean is that the probability distributions change as a function of time on some timescale $\tau_2$ and one can assume that $\tau_2$>>$\tau_1$ (but I do not explicitly know $\tau_2$). I want to construct an estimate of a conditional probability P($r$ | $I_{n-1}$; $t_n$).

I think this is possible (in my case) because

1) There are many measurements occurring during a period where the probability distribution has barely changed (consequence of $\tau_2$>>$\tau_1$)

2) There is good physical reason for why $r_n$ depends on $I_{n-1}$ (ie, I think the dependence will be quite strong. I also expect that this dependence changes with time less than probability functions of r or I. If the dependance does change, it may be on time scale $\tau_3$ such that $\tau_3$>>$\tau_2$>>$\tau_1$)

I tried to approach this problem by estimating P($r$ ^ $I$ ; $t_n$) and P(I ; $t_n$) and then using the identity:

P(r | $I_{n-1}$; $t_n$) = P(r ^ $I_{n-1}$ ; $t_n$)/P($I_{n-1} $;$t_n$)

However, I encountered the following problems

1) There is somewhat little data so making a "local" maximum likely-hood estimate gives bad results. This possibly means I need to make some kind of smoothing/filtering.

2) It is not clear how to best "forget" old measurements (and perhaps the rate of forgetting should be dependent on how "new" data is different from the "old" data)

3) Certain values of I and r are very rare, and it would be nice to be able to handle cases where an event goes from probability 1E-3% to 1% (a 3 orders of magnitude change, although the event remains very rare). In particular no combination of r and I has a probability of exactly 0 (although some combinations might never be measured).

Please let me know what possible solutions I can try or papers/methods I should read.

Best,

Ilya