"Random" Function of a Random Variable (Cover & Thomas Elements of Information Theory)

45 Views Asked by At

I'm looking to clarify a technical point about 'random processing' in Cover & Thomas. They've occasionally refer to a function of a random variable as being 'allowed to be random.' Here are some instances:

In the context of the data-processing inequality:

If $Z = f(Y)$, then $X \rightarrow Y \rightarrow Z$ [their notation for saying that $X, Y,$ and $Z$ form a Markov chain]... Now we can prove an important and useful theorem demonstrating that no processing of $Y$, deterministic or random, can increase the information that $Y$ contains about $X$.

And later, in the context of Fano's inequality:

We observe a random variable $Y$ that is related to $X$... From $Y$, we calculate a function $f(Y) = \hat{X}$, where $\hat{X}$ is an estimate of $X$... We will allow the function $f(Y)$ to be random.

These are a little surprising to me, since I would've thought that the notation $f(Y)$ referred to some deterministic function of $Y$. They don't make any qualifications about the random processing in either context, but I just want to make sure I understand what's meant here.

Surely they must mean more than "$f$ is allowed to change forms (e.g. from $f(y) = y^2$ to $f(y)=y^3$) depending on $Y$'s result," since $H(f(Y)|Y)$ would still be $0$ and $f$ would still be a deterministic function of $Y$ in that case.

I assume they mean that $f$ can incorporate a source of randomness external to $Y$, so that $H(f(Y)|Y)$ is possibly positive. But we can't allow $f$ to use arbitrary sources of randomness - in particular, we clearly can't allow $f$ to depend on $X$.

For example, if $X$ and $Y$ are iid random bits, and $f_X(Y) = X + Y$ (mod 2), then $f$ is a "random processing" of $Y$, but it contains complete information about $X$, given $Y$. So the data-processing inequality and Fano's inequality wouldn't apply here, because $X, Y, f_X(Y)$ is no longer Markov.

Do they mean: "$f$ is allowed to incorporate a source of randomness outside of $Y$, as long as $X \rightarrow Y \rightarrow f(Y)$ remains Markov (i.e. $I(X; f(Y) | Y) = 0$)"? It seems like that avoids breaking anything, but I feel like I might be overcomplicating or missing something about their intended meaning...

1

There are 1 best solutions below

1
On BEST ANSWER

The text is a little too informal, but the meaning is clear. We are building a estimator of some unknown $X$, given some data $Y$; and it's not obvious if we should take a deterministic function of $Y$, or some random variable, or both.

Suppose for example that $X$ is a fair die, and $Y$ is empty. An deterministic estimator would be here a constant number, say $\hat X=1$. It's not obvious if this estimator is preferable (under some cost criterion) over a random one, say, throwing another die.

In this spirit, the Fano paragraph could be restated by using $\hat X = f(Y,Z)$ where $Z$ is a random variable independent of $X$.