Maximum Likelihood Parameter Estimation: Assuming Mean of Observations

Question

Maximum Likelihood Parameter Estimation: Assuming Mean of Observations

481 Views Asked by Bumbble Comm At 01 Apr 2026 - 4:16

I'm currently in a probability class learning about parameter estimation using the maximum likelihood estimator. The problem is as follows: we have a list of independent observations Y y[1]...y[n], that came from some probability distribution $f_Y(y,\lambda) $ with an unknown parameter $\lambda $. (For example, exponential, Gaussian, Poisson, etc.)

We want to estimate the parameter $\lambda$ by maximizing the likelihood that we see the observations we do. Since all observations are independent, we have probability $P(Y,\lambda)= \prod_{i=1}^{n} f_Y(y_i,\lambda)$. To maximize this, we take the derivative with resepect to $\lambda$ and set to 0. $$ \hat\lambda= \arg \max_{\lambda} \left[ P(Y,\lambda) \right]$$

Something I noticed: for every example of this I've seen so far (only about 2 or 3 now), the end result is always the same: the value of the parameter is whatever makes the mean of your observation vector equal $E[f_Y(y,\lambda)] $. For example, for an exponential distribution, we get $$\hat\lambda=\frac{1}{\frac{1}{n}\sum_{i}y_i} = \frac{1}{\mu_Y} $$ This makes intuitive sense, because for an exponential distribution, the expected value is $1/\lambda $. My question is this: can you always assume that the mean of your observations is the mean of your probability distribution and just solve for the unkown parameters using that assumption? Just because it works for the few cases I've seen, I don't know whether this can be generalized to any probability distribution. I'm completely new to these topics, so any additional info would be appreciated.

Thanks in advance!

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2017-04-21 02:44:17

You're noticing that in some cases the MLE is equal to the result of setting the expected value of the observations (e.g. $EY=\frac{1}{\lambda}$) equal to the observed sample mean ($\bar{y}=\frac{1}{n}\sum_{i=1}^ny_i$) and solving for the parameter (e.g. $\hat{\lambda}=\frac{1}{\bar{y}}$). This latter method is called the method of moments (MOM) and does not generally give the same result as MLE. (However, there is a kind of connection between MLE and a generalized MOM.)

As an example of how the two may differ, consider $X_1,X_2,X_3$ i.i.d. Uniform$(0,\theta)$. Then $\hat{\theta}_\text{MLE}=\max\{X_1,X_2,X_3\}$, whereas $\hat{\theta}_\text{MOM}=2\bar{X}$.

NB: The MOM estimator may sometimes be nonsensical; e.g., in the above example, if the observations are $(X_1,X_2,X_3)=(1,1,10)$, then $\hat{\theta}_\text{MOM}=2\bar{X}=2\frac{1+1+10}{3}=8$, even though a value of $10$ was observed!

**Bumbble Comm** · Answer 2 · 2017-04-21 06:27:35

ok, So let's consider the case where $X_1,\ldots,X_n$ are independent and are uniformly distributed on the interval $[0,\theta]$ and you want the MLE of $\theta.$ The mean of the distribution is $\theta/2.$ The density for a single observation is $$ f(x) = \begin{cases} 1/\theta & \text{if } 0 < x < \theta, \\ 0 & \text{otherwise.} \end{cases} $$ The joint density is $$ x\mapsto \begin{cases} 1/\theta^n & \text{if } 0< x_1,\ldots,x_n < \theta, \\ 0 & \text{if at least one of } x_1,\ldots, x_n \text{ is } > \theta. \end{cases} $$ That implies $$ L(\theta) = \begin{cases} 1/\theta^n & \text{if } \theta > \max\{x_1,\ldots,x_n\}, \\ 0 & \text{if } \theta < \max\{x_1,\ldots,x_n\}. \end{cases} $$ As $\theta$ gets smaller, $L(\theta)$ gets bigger, until $\theta$ gets smaller than $\max\{x_1,\ldots,x_n\}.$ So the value of $\theta$ that maximizes $L(\theta)$ is $\max\{x_1,\ldots,x_n\}.$

Now notice two things:

We did not find this by setting the derivative equal to $0$ and solving for $\theta$. It's not always done that way.
That value of $\theta$ makes the mean of the distribution equal to $\max\{x_1,\ldots,x_n\}/2,$ and that is not equal to $(x_1+\cdots+x_n)/n.$ See if you can find concrete examples, say with $n=3$, where $(x_1+x_2+x_3)/3$ is nowhere neear $\max\{x_1,x_2,x_3\}/2.$

Maximum Likelihood Parameter Estimation: Assuming Mean of Observations

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in PARAMETER-ESTIMATION

Trending Questions

Popular # Hahtags

Popular Questions