Difference between functions and Markov chains for estimation

46 Views Asked by At

Given two random variables $X$ and $Y$ on two alphabet $\mathcal X$ and $\mathcal Y$, I'm interested in minimizing the expected distortion $\mathbb E[d(X,\hat X)]$ for $\hat X$ taking values in alphabet $\hat{\mathcal X}$ in two different scenarios. The first one is $X-Y-\hat X$ forms a Markov chain and the second one is $\hat X=\hat x(Y)$ for some function $\hat x:\mathcal Y\rightarrow \hat{\mathcal X}$.

I am very tempted to say that \begin{align*} \inf_{X-Y-\hat X} \mathbb E[d(X,\hat X)] = \inf_{\hat x:\mathcal Y\rightarrow \hat{\mathcal X} } \mathbb E[d(X,\hat x(Y))] \end{align*}

But cannot prove it in general (so it may of course be false), it is trivial to have an inequality here since it always holds that $X-Y-\hat x(Y)$ is a Markov chain. Does anyone have insights about that, maybe a couter example or a proof ?

1

There are 1 best solutions below

3
On BEST ANSWER

Here is a proof that your fact is true for the case $\mathcal{X}, \mathcal{Y}, \mathcal{\hat{X}}$ are finite sets.

Let $(X,Y,\hat{X})$ be random variables that form a Markov chain $X\rightarrow Y \rightarrow\hat{X}$. Then \begin{align} E[d(X,\hat{X})] &= \sum_{y \in \mathcal{Y}}\sum_{\hat{x} \in \mathcal{\hat{X}}}\sum_{x \in \mathcal{X}} P[Y=y,X=x,\hat{X}=\hat{x}]d(x,\hat{x})\\ &=\sum_{y \in \mathcal{Y}}P[Y=y]\sum_{\hat{x} \in \mathcal{\hat{X}}}P[\hat{X}=\hat{x}|Y=y]\sum_{x \in \mathcal{X}} P[X=x|Y=y]d(x,\hat{x})\\ &\geq \sum_{y \in \mathcal{Y}}P[Y=y]\min_{c\in\mathcal{\hat{X}}}\left[\sum_{x \in \mathcal{X}} P[X=x|Y=y]d(x,c)\right] \end{align} So we can define a function $c:\mathcal{Y}\rightarrow \mathcal{\hat{X}}$ by $$ c(y) = \arg\min_{c \in \mathcal{\hat{X}}} \left[\sum_{x \in \mathcal{X}} P[X=x|Y=y]d(x,c)\right]$$ breaking ties in some arbitrary (deterministic) way. Then \begin{align} E[d(X,\hat{X})] &\geq \sum_{y \in \mathcal{Y}} P[Y=y]\sum_{x \in \mathcal{X}} P[X=x|Y=y]d(x,c(y))\\ &= E[d(X, c(Y))]\\ &\geq \inf_{\hat{x}:\mathcal{Y}\rightarrow\mathcal{\hat{X}}} E[d(X,\hat{x}(Y))] \end{align}


A "proof sketch" of the more general case (without assuming finite alphabets) is this: For each $y \in \mathcal{Y}$ and $\hat{x} \in \mathcal{\hat{X}}$ we have \begin{align} E[d(X,\hat{X})|Y=y,\hat{X}=\hat{x}] &= E[d(X,\hat{x})|Y=y,\hat{X}=\hat{x}]\\ &\overset{(a)}{=} E[d(X,\hat{x})|Y=y]\\ &\geq \inf_{c \in \mathcal{\hat{X}}} E[d(X,c)|Y=y] \end{align} where (a) uses the Markov property $X\rightarrow Y\rightarrow \hat{X}$. Now for each $y \in \mathcal{Y}$, define $c(y) \in \mathcal{\hat{X}}$ as a particular minimizer of $E[d(X,c)|Y=y]$ over $c \in \mathcal{\hat{X}}$ (assuming the minimizer exists for simplicity, and breaking ties deterministically). So the right-hand-side of the above inequality chain is $E[d(X, c(Y))|Y=y]$ and thus $$ E[d(X,\hat{X})|Y=y, \hat{X}=\hat{x}] \geq E[d(X,c(Y))|Y=y]$$ This holds for all $y \in \mathcal{Y}$ and $\hat{x} \in \mathcal{\hat{X}}$ and so $$ E[d(X,\hat{X})|Y, \hat{X}] \geq E[d(X,c(Y))|Y]$$ Taking expectations of both sides and using iterated expectations gives \begin{align} E[d(X,\hat{X})] &\geq E[d(X,c(Y))]\\ &\overset{(a)}{\geq} \inf_{\hat{x}:\mathcal{Y}\rightarrow\mathcal{\hat{X}}}E[d(X,\hat{x}(Y))] \end{align} where (a) holds because $c(y)$ is just a particular deterministic function from $\mathcal{Y}$ to $\mathcal{\hat{X}}$.

[see comments below for some pesky measurability details.]