Having already studied Discrete Time Markov Chains, I have recently began to self-study the the natural extension to Continuous Time Markov Chains. One of the very first concepts to appear is the notion of a Q-Matrix with entries in the form $q_{i,j}$ for the $(i,j)$ entry of the matrix. The textbook that I am using provides the following definition for these entires:
Transition Derivatives: We define $q_{i,j} := \lim _{\delta \downarrow 0} \big{(}\frac{p_\delta (i,j) - p_0 (i,j)}{\delta} \big{)}$ for $i,j \in I$
There are a few things that are unclear to me here.
I believe that time-homogeneity is assumed for the Markov Chain and so I'm not sure what the notation $p_{\delta} (i,j)$ and $p_0(i,j)$ refer to. I know that $p_{i,j}(t) = \mathbb{P} (X_t = j : X_0 = i)$, but this notation seems to be written backwards. Is this just a mistake / inconsistency by the author?
Assuming that my assumption in point $(1)$ is correct, I am unsure about what this quantity actually represents. It seems to be calculating the change in probability of transitioning state over a small time with respect to time. But my understanding is that changes in state occur instantaneously and so I am not sure I understand how we can interpret the concept of a rate of change in a context where we don't have a gradual transition from one state to another.
I would be grateful for any clarification here.
The notation can vary considerably between different authors, but I think it's reasonable in this case to assume that $p_\delta(i,j)$ is the same as $p_{i,j}(\delta)$, that is, the probably you end up in state $j$ at time $\delta$ given you started in state $i$ at time $0$. I have seen both notations used (in the same 1-hour lecture, no less!).
One nice interpretation is to think of the $q_{i,j}$s as rate parameters for exponential random variables. If you're in state $i$ at time $t=0$, and you can move from state $i$ to states $j_1,\dots,j_n$, then you can imagine that at each of the states $j_1,\dots,j_n$, there's an alarm clock that will ring at a random, exponentially-distributed time, where the parameters for the exponential random variables are $q_{i,j_1},\dots,q_{i,j_n}$, respectively. When the first "alarm clock" rings, then you move to the state associated with that clock (actually, this construction works for countably infinite state spaces as well).
I am sure many stochastic processes texts will discuss this interpretation/construction of continuous-time Markov chains; one fairly accessible source (that's more heavy on intuition and less heavy on formalism—and is also where I picked up this "alarm clock" analogy) is Lawler's book.