Let's take $N$ i.i.d. stochastic variables $X_i$, where $X_i \sim Bin(n,p)$.
Taking inspiration from here, we should have the following facts:
- $Var(X_i)=np(1-p)$.
- The sample statistics $M(X_1,...,X_n)=\frac{\sum_i X_i}{N}$ is a sufficient and complete statistics.
- The MLE estimator for $np(1-p)$ is $T_{MLE}=nM(1-M)$.
Combining these points we have that the $T_{MLE }$ is also UMVUE by the Lehmann-Scheffe' lemma.
Now we have also the following fact:
- The (corrected) sample variance $S^2=\frac{1}{N-1}\sum_i{(X_i-M)^2}$ is an unbiased estimate of $Var(X_i)$.
From Lehmann-Scheffe' we should have then by consistency:
$$E[S^2\mid M]=nM(1-M)$$
My questions:
Is my reasoning correct or am I applying some theorem in a wrong way ?
If the reasoning is correct, what would be a direct derivation of the final result ? Is the formula trivial for some reason I do not see now ?
Your reasoning is correct except MLE is not the UMVUE of the population variance.
A complete sufficient statistic for $p$ is $T=\sum\limits_{i=1}^N X_i$, which has a $\mathsf{Bin}(nN,p)$ distribution.
Now $E_p[T]=nNp$ and $\operatorname{Var}_p[T]=nNp(1-p)$ for all $p\in(0,1)$.
Again, $$E_p[T^2]=\operatorname{Var}_p[T]+(E_p[T])^2=nNp(1-p)+n^2N^2p^2$$
Or, $$E_p[T^2-T]=nNp^2(nN-1)$$
That is, $$E_p\left[\frac{T(T-1)}{N(nN-1)}\right]=np^2$$
So you have an unbiased estimator of population variance based on $T$ (and hence UMVUE):
$$E_p\left[\frac TN-\frac{T(T-1)}{N(nN-1)}\right]=np-np^2=np(1-p)\quad,\forall\,p\in(0,1)$$
With $\overline X=\frac TN$, the sample variance $S^2=\frac1{N-1}\sum\limits_{i=1}^N (X_i-\overline X)^2$ is unbiased for population variance. So by Lehmann-Scheffe, $E\left[S^2\mid T\right]$ is also UMVUE of $np(1-p)$.
As UMVUE is unique whenever it exists, you can say
$$E\left[S^2\mid T\right]=\frac TN-\frac{T(T-1)}{N(nN-1)}\tag{*}$$
This can be rewritten in terms of $\overline X$ of course.
A direct way to obtain $(*)$ would be to proceed using linearity of expectation.
I think it should be something like
\begin{align} E\left[S^2\mid T=t\right]&=E\left[\frac{1}{N-1}\sum_{i=1}^N\left(X_i-\frac tN\right)^2\mid T=t\right] \\&=E\left[\frac{1}{N-1}\left(\sum_{i=1}^N X_i^2-\frac{t^2}{N}\right)\mid T=t\right] \\&=\frac{1}{N-1}\sum_{i=1}^N E\left[X_1^2\mid T=t\right]-\frac{t^2}{N(N-1)} \end{align}
Now we only have to recall that $X_1$ conditioned on $T$ has a hypergeometric distribution.