I observed 400 episodes of nursing care in a hospital. I tracked the movement of the nurses between 5 rooms $A-E$. The maximum likelihood of them moving from room $i\rightarrow j$ is given by:
\begin{equation} P_{ij}=\displaystyle \dfrac{\text{# of times from room $i\rightarrow j$}}{\displaystyle \text{Total # of transitions to any room}}\end{equation}
- Is there a way of defining a confidence interval on this maximum likelihood estimate $P_{ij}$?
- And for all maximum likelihood estimates of all possible room combinations?
Reference:
I have come across a reference: http://arxiv.org/pdf/0905.4131v1.pdf This suggests that for n observations $X_i$, the empirical maximum likelihood estimate $\hat P_{ij}$ minus the actual transition probability $P_{ij}$ would tend to a multivariate normal distribution with mean 0 and matrix of variance-covariances $\Sigma$.
$$\sqrt{n}|\hat{P_{ij}}-P_{ij}|\sim N(0,\Sigma)\quad \text{as}\quad n\rightarrow \infty$$
How to I calculate $\Sigma$ from my observed data? And how does this relate to confidence intervals?
One could probably think about the nature of the dependence among the various random variables that would have been observed, but for now I'll do something simpler:
You have $n$ independent Bernoulli trials; in this case $n=400$. You have $x$ successes; in this case, $x$ is the numerator in the fraction. So the number of successes is binomially distributed with an unobservable parameter $p$. Theory tells us the expected value of the number of successes is $400p$ and the variance of the number of successes is $400p(1-p)$. That means the expected proportion of successes is $p$ and the variance of the proportion is $p(1-p)/400$. So we can cite the central limit theorem and we have an approximately normal distribution; thus about a $0.95$ chance of being between $-1.96$ and $+1.96$. We use $x/400$ as an estimate of $p$. Our $95\%$ confidence interval therefore has endpoints $$ \frac{x}{400} \pm 1.96\sqrt{\frac{(x/400)(1-(x/400))}{400}} $$
In many textbooks, you'll see a section called something like "Confidence interval for a proportion" that covers this.