My statistics text is trying to define "estimated standard error" by an example, but I'd like a real definition so I can mentally keep it distinguished from just "standard error." It reads:
Let $X_1,\ldots,X_n \operatorname{Bernoulli}(p)$ and let $\hat{p}_n = n^{-1}\sum_i X_i$. Then $\mathbb{E}(\hat{p}_n) = n^{-1}\sum_i \mathbb{E}(X_i) = p$ so $\hat{p}_n$ is unbiased. The standard error is $se = \sqrt{\mathbb{V}(\hat{p}_n)} = \sqrt{p(1-p)/n}$. The estimated standard error is $\hat{se} = \sqrt{\hat{p}(1-\hat{p})/n}$.
Based on knowing $\mathbb{E}(\hat{p}_n)$ I can do the algebra to derive $\sqrt{\mathbb{V}(\hat{p}_n)} = \sqrt{p(1-p)/n}$ and I can understand $se = \sqrt{\mathbb{V}(\hat{p}_n)}$ as just being what the definition of the standard error of a point estimator is.
I am unclear however how to make the jump to the formula for $\hat{se}$ other than "swap in the best estimate for the variable you need but don't have", i.e. $\hat{p}_n$ for $p$, but that doesn't sound like a formal definition. Is that all that's going on here? It doesn't seem to be being done rigorously.
$\hat{p}$ is not only unbiased but it is a consistent estimator, i. e. $ P(|\hat{p} − p| > \epsilon) \to 0$ for $n \to \infty$.
There is a property for consistent estimators that states that if $\theta = g(p)$ then $\hat{\theta} = g(\hat{p})$ is also a consistent estimator
So in this case $se = \sqrt{p(1-p)/n} = g(p) $ so $\hat{se} = g(\hat{p}) = \sqrt{\hat{p}(1-\hat{p})/n}$
You have to take into account that this property only states that it is a consistent estimator, that does not mean that it is unbiased or have any other good property.