This is from Casella and Berger's Statistical Inference:
Definition: A statistic $T(\mathbf{X})$ is a sufficient statistic for $\theta$ if the conditional distribution of the sample $\mathbf{X}$ given the value of $T(\mathbf{X})$ does not depend on $\theta$.
In the discrete case,
Let $t$ be a possible value of $T(\mathbf{X})$ , that is, a value such that $P_\theta(T(\mathbf{X}) = t) > 0$. We wish to consider the conditional probability $P_\theta(\mathbf{X} = \mathbf{x}|T(\mathbf{X}) = t)$. If $\mathbf{x}$ is a sample point such that $T(\mathbf{x}) \neq t$, then clearly, $P_\theta(\mathbf{X} = \mathbf{x}|T(\mathbf{X}) = t) = 0$. Thus, we are interested in $P(\mathbf{X} = \mathbf{x}|T(\mathbf{X}) = T(\mathbf{x}))$. By the definition, if $T(\mathbf{X})$ is a sufficient statistic, this conditional probability is the same for all values of $\theta$ so we have omitted the subscript.
This is the part I'm having trouble with:
A sufficient statistic captures all the information about $\theta$ in this sense. Consider Experimenter 1, who observes $\mathbf{X} = \mathbf{x}$ and, of course, can compute $T(\mathbf{X} = T(\mathbf{x})$. To make an inference about $\theta$, he can use the information that $\mathbf{X} = \mathbf{x}$ and $T(\mathbf{X}) = T(\mathbf{x})$. Now consider Experimenter 2, who is not told the value of $\mathbf{X}$ but only that $T(\mathbf{X}) = T(\mathbf{x})$. Experimenter 2 knows $P(\mathbf{X} = \mathbf{y}|T(\mathbf{X}) = T(\mathbf{x}))$, a probability distribution on $A_{T(\mathbf{x})} = \{\mathbf{y}: T(\mathbf{y}) = T(\mathbf{x})\}$, because this can be computed from the model with knowledge of the true value of $\theta$.
So far, so good. But below, what exactly is this random variable $\mathbf{Y}$? I'm having trouble unraveling why exactly this conclusion means that Experimenter 2 has the same information that Experimenter 1 has regarding the parameter $\theta$. I apologize for not framing my question better -- I'm just quite confused by the point the author is trying to make in the paragraph below. I will update with an edit if I can clarify my question further.
Thus, Experimenter 2 can use this distribution and a randomization device, such as a random number table, to generate an observation $\mathbf{Y}$ satisfying $P(\mathbf{Y} = \mathbf{y}|T(\mathbf{X}) = T(\mathbf{x})) = P(\mathbf{X} = \mathbf{y}|T(\mathbf{X}) = T(\mathbf{x}))$. It turns out that, for each value of $\theta$, $\mathbf{X}$ and $\mathbf{Y}$ have the same unconditional probability distribution, as we shall see below. So Experimenter 1, who knows $\mathbf{X}$, and Experimenter 2, who knows $\mathbf{Y}$ have equivalent information about $\theta$, but surely the use of the random number table to generate $\mathbf{Y}$ has not added to Experimenter 2's knowledge of $\theta$. All his knowledge about $\theta$ is contained in the knowledge that $T(\mathbf{X}) = T(\mathbf{x})$. So Experimenter 2, who knows only $T(\mathbf{X}) = T(\mathbf{x})$, has as much information about $\theta$ as does Experimenter 1, who knows the entire sample $\mathbf{X} = \mathbf{x}$.
To complete the above argument, we need to show that $\mathbf{X}$ and $\mathbf{Y}$ have the same unconditional distribution, that is, $P_\theta(\mathbf{X} = \mathbf{x}) = P_\theta(\mathbf{Y} = \mathbf{x})$ for all $\mathbf{x}$ and $\theta$. Note that the events $\{\mathbf{X} = \mathbf{x}\}$ and $\{\mathbf{Y} = \mathbf{x}\}$ are both subsets of the event $\{T(\mathbf{X}) = T(\mathbf{x})\}$
Also recall that $$ P(\mathbf{X} = \mathbf{x}|T(\mathbf{X}) = T(\mathbf{x})) = (\mathbf{Y} = \mathbf{x}|T(\mathbf{X}) = T(\mathbf{x})) $$ and these conditional probabilities do not depend on $\theta$. Thus, we have $$ P_\theta(\mathbf{X} = \mathbf{x}) = P_\theta(\mathbf{X} = \mathbf{x} \text{ and } T(\mathbf{X}) = T(\mathbf{x})) \\ = P(\mathbf{X} = \mathbf{x}|T(\mathbf{X}) = T(\mathbf{x}))P_\theta(T(\mathbf{X}) = T(\mathbf{x})) \\ = P(\mathbf{Y} = \mathbf{x}|T(\mathbf{X}) = T(\mathbf{x}))P_\theta(T(\mathbf{X}) = T(\mathbf{x})) \\ = P_\theta(\mathbf{Y} = \mathbf{x} \text{ and } T(\mathbf{X}) = T(\mathbf{x}))\\ = P_\theta(\mathbf{Y} = \mathbf{x})$$
Let me try to rewrite the paragraph as I understand it:
An experimenter 1 observes a random variable $X$ on a measurable space $(\Omega, \mathcal A)$ with values in any given space (possibly high or even infinite dimensional). The statistical experiment or model is given by the family of possible probability distributions $(P_\theta)_{\theta\in\Theta}$, with a suitable parameter space $\Theta$.
Experimenter 1 gains knowledge about the unknown parameter from a certain event $\{X=x\}$. Their inference is completely based on this event and is described by the probabilities $P_\theta(X=x)$.
Now there is also experimenter 2 who does not know that the event $\{X=x\}$ has appeared. They only know that $\{T(X)=T(x)\}$. Intuitively, experimenter 2 now has less information about $\theta$ since $\{X=x\}\subseteq\{T(X)=T(x)\}$.
But since $T$ is sufficient for $\theta$ we can show that experimenter 2 in fact has the same information: The conditional probabilities $P(X=x\mid T(X)=T(x))$ are independent of $\theta$ and are therefore accessible to both experimenters. These probabilities define a distribution that experimenter 2 might use to sample from. This random sample is defined to be the variable $Y$.
Therefore, $$ P(X=x\mid T(X)=T(x))= P(Y=x\mid T(X)=T(x)) $$
It is now shown that $X$ and $Y$ have the same (unconditional!) distribution: $P_\theta(X=x)=P_\theta(Y=x)$. This is the same as to say that experimenter 2 can draw the same conclusions about $\theta$ as experimenter 1 can, since they can sample from a variable with a distribution that is the same as that of $X$, even with the sole knowledge of $\{T(X)=T(x)\}$.