Iterated conditional expectations on Markov chains

389 Views Asked by At

De Cooman, Hermans, and Quaeghebeur's "Imprecise Markov Chains and their Limit Behaviour" (also here) equation (4) defines the operator

$$\mathbb{T}_n f(x_{1:n}) := E(f(x_{1:n},\cdot)|x_n) = \sum_{x_{n+1} \in \cal X}f(x_{1:n},x_{n+1})p(x_{n+1}|x_n)$$

where $x_{1:n}$ is an abbreviation for $x_1, x_2, \ldots, x_n$, values of a Markov chain with state space $\cal X$ at subsequent times, and $f(x_{1:n},\cdot)$ is $f(x_{1:n+1})$ as a function of $x_{n+1}$ with $x_{1:n}$ treated as fixed. (I modified the notation very slightly to avoid unnecessary explanation here.)

Equation (5), which I understand and probably don't need to reproduce here, shows that $E(f|x_{1:N-1}) = \mathbb{T}_{N-1}f(x_{1:N-1})$. I'm confused by equation (6):

$$E(f|x_{1:N-2}) = E(E(f(x_{1:N-2},\cdot,\cdot)|x_{1:N-2}, \cdot)\,|\,x_{1:N-2}) = \mathbb{T}_{N-2}\mathbb{T}_{N-1}f(x_{1:N-2})$$

The rhs should be read as $(\mathbb{T}_{N-2}(\mathbb{T}_{N-1}f))(x_{1:N-2})$. I understand the first equation but not the second in (6).


EDIT: On the advice of folks in Meta, I've deleted the rest of the original post, as well as early revisions of it that I made in response to equaeghe's helpful comments. I was unable to clarify the question sufficiently the first time around, but I have a new understanding, so I'll try a different approach now that I've thought about it further.


I want to make all arguments and scope relations explicit, so I'll use the $\lambda$ operator from lambda calculus to indicate function arguments. [For example, to treat $f(x,y)$ as a function of $y$, one could write $(\lambda y)f(x,y)$.] I'll also use subscripts on the expectation operator $E$ to clarify what variable the expectation is over. (In the article, subscripts on $E$ have a different meaning.)

Let $n$ above be $N-1$ as in the article, and define

$$g := \mathbb{T}_{N-1}\,f = (\lambda x_{1:N-1})\,E_{x_N}(f(x_{1:N-1},x_N)|x_{N-1}).$$

This rhs is simply the definition of $\mathbb{T}_n$ with $N-1$ substituted for $n$, and abstractions made explicit. This will make the notation dense, but I have trouble seeing what I'm doing without it.

I'll start from the far right hand side of equation (6) without the argument $x_{1:N-2}$ and then expand $\mathbb{T}_{N-2}(\mathbb{T}_{N-1}f)$ using the notation I described a moment ago:

$$\mathbb{T}_{N-2}\,g =$$ $$(\lambda x_{1:N-2})\,E_{x_{N-1}}[g(x_{1:N-2},x_{N-1})|x_{N-2}] =$$ $$(\lambda x_{1:N-2})\,E_{x_{N-1}}[(\mathbb{T}_{N-1}\,f)(x_{1:N-2},x_{N-1})|x_{N-2}] =$$ $$(\lambda x_{1:N-2})\,E_{x_{N-1}}[[(\lambda x_{1:N-1})E_{x_N}(f(x_{1:N-1},x_N)](x_{1:N-2},x_{N-1})|x_{N-1})|x_{N-2}]$$

Now substitute the argument $(x_{1:N-2},x_{N-1})$ for the parameters specified by $\lambda$ inside the inner expectation:

$$= (\lambda x_{1:N-2})\,E_{x_{N-1}}[E_{x_N}(f(x_{1:N-2},x_{N-1},x_N)|x_{N-1})|x_{N-2}]$$

If we then pass $x_{1:N-2}$ to this function (since that's what the rhs of (6) said) and remove the subscripts on the expectation operators, we get:

$$E[E(f(x_{1:N-2},x_{N-1},x_N)|x_{N-1})|x_{N-2}]$$

This is similar to the middle of equation (6). The conditioned function $f$ with its arguments is the same. (See equaghe's comment on the meaning of the dots in (6).) The conditioning variables are different, however: Where I have $x_{N-2}$, equation (6) has $x_{1:N-2}$, and where I have $x_{N-1}$, (6) has $x_{1:N-2},x_{N-1}$.

What have I done wrong? (Answer or not, thanks to anyone who wades through this notation.)

1

There are 1 best solutions below

0
On

I now see what is missing, and per @equaeghe's suggestion in a comment, I'll answer my own question. (Apologies to those who don't like self-answered questions. I do think this may be useful to others reading the article that was the subject of the question.)

The middle part of equation (6),

$$E(E(f(x_{1:N-2},\cdot,\cdot)|x_{1:N-2}, \cdot)\,|\,x_{1:N-2})$$

and the final expression derived in the question,

$$E[E(f(x_{1:N-2},x_{N-1},x_N)|x_{N-1})|x_{N-2}]$$

are equal because we're assuming the sequence has the Markov property. That is, conditioning on $x_{1:N-2}$ is the same as conditioning on $x_{N-2}$, and conditioning on $(x_{1:N-2}, \cdot) = x_{1:N-1}$ is the same as conditioning on $x_{N-1}$. This kind of substitution was illustrated in equation (5) in the original article.