why should conditioning on a random variable subtract information from it?

30 Views Asked by At

The defintion of a sufficient statistic is as follows:

A statistic $t = T(X)$ is sufficient for underlying parameter $θ$ precisely if the conditional probability distribution of the data $X$, given the statistic $t = T(X)$, does not depend on the parameter $θ$

That is data $X$ had some information about $\theta$, say $I(\theta;X)$ then this information get subtracted from $X$ if i condition on the random variable $T(X)$.

I do not understand why condtioning on a random variable should 'subtract information' from $X$

What i mean is that while i am comfortable thinking in the following manner:

if $X=Y+Z$ where $Z$ contains no information about $\theta$ then subtracting $Y$ from $X$ subtracts information from $X$ about $\theta$.

But, the idea that conditioning can subtract information is a little unsettling at best. So, is there any mathematical justification for why conditioning on a random variable can subtract information from that random variable?