I am given a random sample $X_1, \ldots, X_n$ (of size n) from the exponential distribution whose pdf is $f_\theta(x) = \frac{1}{\theta} e^{-\frac{x}{\theta}}$ for $0 < x < \infty.$
I am asked to give the "usual estimator of $\theta.$"
The answer is: "Since $X \sim \mathsf{Exp}(\frac{1}{\theta})$, we know that $E(X) = \theta$ and we can estimate $\theta$ with the empirical sample mean."
Why isn't the answer to this question $\frac{1}{\theta}$ since the exponential distribution has an expectation of $\frac{1}{\theta}$?
Other comments or observations.
Assume the following. You have a independent and identically distributed random variables, say $X_1, \ldots, X_n.$ Suppose that these random variables follow some distribution $F$ on $\mathbf{R},$ so that $\mathbf{P}(X_1 \in \mathrm{A}) = \int\limits_\mathrm{A} F(dx).$ Suppose $F$ has a finite first moment, that is $\int\limits_{\mathbf{R}} |x| F(dx) < \infty$ and call $\mu = \int\limits_\mathbf{R} x F(dx)$ the first moment of $F.$ The "method of moments" estimator of $\mu$ is, by definition, $\bar X = \bar X(n):= \frac{1}{n} \sum\limits_{i = 1}^n X_i.$ Simple linearity of the expectation shows at once that $\mathbf{E}(\bar X) = \mu$ and the law of large numbers show that $\lim\limits_{n \to\infty} \bar X = \mu.$ These two properties are deisrable, the first one is called "unbiasdness" and the second one is called "consistency." Therefore, for any distribution whatsoever, the sample mean define a sequence of estimators that are unbiased and the sequence is consistent.
Now, if is often the case that statisticians assume much more than the previous paragraph, explicitly, they assume that $F(dx) = f_\theta(x) dx,$ where $\theta \in \Theta \subset \mathbf{R}^q,$ and $(x, \theta) \to f_\theta(x)$ is smooth in some sense. Under these circumstances, they construct the "likelihood function" which is $L(X_1, \ldots, X_n; \theta) = \prod\limits_{i = 1}^n f_\theta(X_i)$ and maximise this function in $\theta$ for observed $X_1, \ldots, X_n.$ This procedure constructs $\hat \theta = \hat \theta(X_1, \ldots, X_n) \in \Theta$ which is known as the "maximum likelihood estimate" of the parameter $\theta.$ Under general conditions on $(x, \theta) \mapsto f_\theta(x),$ the maximum likelihood estimate is the best estimator, and it is often the case (but not always!) that when $\theta = \int\limits_\mathbf{R} x f_\theta(x) dx$ ($\theta$ is the population mean when the "true model" is $f_\theta$) then $\hat \theta = \bar X.$ An important result regarding estimators of a "parameter" is that estimators should not depend on the parameter, but solely on the sample (they are functions of $(X_1,\ldots, X_n)$ but not of $(X_1, \ldots, X_n, \theta)$).