Usually, what I read is taking conditional expecation with respect to X or Z, a single random variable, but in the book "elements of statistical learning" page 291,given $T=(Z,Z^m)$ and $Z^m$ is missing data:
we have
$l(Q';Z)=l_0(Q';T)-l_1(Q';Z^m|Z)$
where $l_1$ is based on the conditional density $Pr(Z^m|Z,Q')$.
Taking conditional expectations with respect to the distribution of T|Z governed by parameter Q gives:
$l(Q';Z)=E[l_0(Q';T)|Z,Q]-E[l_1(Q';Z^m|Z)|Z,Q]$
I don't quite get the idea of distribution of $T|Z$, and how come the likelihood function became the Expectation form. Thanks!
The distribution of $T|Z$ is exactly what it sounds like.. It's the distribution of all of the data conditioned on the values of the non-missing data. All they are doing in that equation is taking the conditional expected value of both sides of the equation with respect to $Z$ (and under parameter $Q$). They took the expected value of both sides because we don't know the values of $Z^m$ so we need to average over them. The left-hand side $l(Q',Z)$ only depends on the data through $Z$ so it's not affected by the conditional expectation wrt $Z$ (thus we appear to get more than our money's worth from the decision to take this expectation, since the goal is to calculate $l(Q',Z)$).