The defintion of a sufficient statistic is as follows:
A statistic $t = T(X)$ is sufficient for underlying parameter $θ$ precisely if the conditional probability distribution of the data $X$, given the statistic $t = T(X)$, does not depend on the parameter $θ$
That is data $X$ had some information about $\theta$, say $I(\theta;X)$ then this information get subtracted from $X$ if i condition on the random variable $T(X)$.
I do not understand why condtioning on a random variable should 'subtract information' from $X$
What i mean is that while i am comfortable thinking in the following manner:
if $X=Y+Z$ where $Z$ contains no information about $\theta$ then subtracting $Y$ from $X$ subtracts information from $X$ about $\theta$.
But, the idea that conditioning can subtract information is a little unsettling at best. So, is there any mathematical justification for why conditioning on a random variable can subtract information from that random variable?