I was reading a book on mathematical statistics and came across the following statement.
Suppose that the statistic $T(X)$ is sufficient for the parameter $\theta$. If $\hat\theta(X)$ is a maximum likelihood estimator of $\theta$ and $\hat\theta$ is unique then it depends on $X%$ only through $T(X)$.
I assume that an argument can be given in the following way:
Let $X$ denote a random variable and $x$ its value.
$P_{\theta}(X)=P_{\theta}(X=x, T(X)=T(x))=P(X=x|T(X)=x)P_{\theta}(T(X)=x)$
The third equality follows directly from the sufficiency. And clearly there is an equivalence between finding the maximum of $P_{\theta}(X)$ with respect to $\theta$ and the maximum of $P_{\theta}(T(X)=x)$.
However, I know something about the proof is wrong since I am not using the uniqueness, can anybody show me where the problem is?
The problem in the proof is in wrong conclusions from the last phrase:
It does not imply that any $\theta$ that maximises $P_{\theta}(X)$ depends on $T(X)$. It implies only that the set of all values of $\theta$ which provide maximum of $P_{\theta}(X)$ depends on $T(X)$ only. And particular value of $\hat\theta$ among this set can possibly be chosen in such a way that it will depend on the sample itself.
For example, $X_i$ are Uniform $(\theta,\theta+1)$. Here $T(X)=(X_{(1)},X_{(n)})$ and MLE is any $\theta$ from the interval $X_{(n)}-1\leq \theta \leq X_{(1)}$. Say, for $n\geq 2$ we can take $$ \hat\theta = \frac{X_1^2}{X_1^2+X_2^2} (X_{(n)}-1) + \frac{X_2^2}{X_1^2+X_2^2} X_{(1)} $$ Or $$ \hat\theta = \cos(X_1) (X_{(n)}-1) + (1-\cos(X_1)) X_{(1)}. $$ It depends on $X$ not only through $T(X)$.
This was almost exact citation from Moore, D. S. (1971). Maximum Likelihood and Sufficient Statistics. The American Mathematical Monthly, 78(1), 50. doi:10.2307/2317488.
The paper containes three pages and completely devoted to your question.