Conditional Probability/Expectation in the EM algorithm

126 Views Asked by At

I'm doing a study in which I measure data under a random censoring process. The observed data which may be interpreted as the lifetime of a subject, is denoted by $t$, with the censoring variable $c$ (where $c=1$ if the subject has deceased and otherwise 0).

For certain reasons, the lifetime of a subject is infinite if $z=0$, we will however observe a lifetime $t<\infty$ due to the censoring. Thus, $z=0$ is a latent variable which is only observed if $c=1$, i.e. if a subject does NOT have an infinite lifetime.

We let $S(t|z=1)=S(t)=1-F(t)$ (for ease of notation) be the survival function associated with the subjects who are under risk of dying.

Now, for statistical inference w.r.t. data I want to compute the expected value of $z$ conditional on the observed data.

$E[z|data]=c+(1-c)P(z=1|data)$ since $z=1$ if $c=1$. Now, first let the data only consist of the observed lifetime $t$ and the censoring variable, then we have

$E[z|data]=c+(1-c)P(z=1|T>t)$ since $\{t,c\}$ implies $T>t$

$=c + (1-c)\frac{P(T>t|z=1)P(z=1)}{P(T>t)}$

Which works completely fine with the algorithm/inference.

If I however introduce the truncation $L$, then we have

$E[z|data]=c+(1-c)P(z=1|T>t,T>L)$, but $t>L$ so the truncation gives no additional information. Yet, if there is a large proportion of truncated observations, we have a heavily biased sample of longer lifetimes $t$.

Consequently, the conditional expectation of $z$ will also be biased. As a result the algorithm yields strange results.

Is there any suggestions on how to adjust this conditional expectation w.r.t the biased sample