Understanding 'Maximum Likelihood Estimation' Foundations

Question

Understanding 'Maximum Likelihood Estimation' Foundations

40 Views Asked by Bumbble Comm At 01 Apr 2026 - 9:50

Some low hanging fruit here for you, dear reader:

Can someone please help shed light on the following snippet, boxed in red?

I don't see how the Gaussian distribution is monotonic in $\mu$ or $\sigma^2$. But it seems like a pivotal idea in the derivation of the Maximum Likelihood Estimator. Thanks in advance!

^snippet from Optimal Estimation of Dynamic Systems, 2nd Ed. Crassidis & Junkins. $\S 2.5$

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

This excerpt is, in my opinion, poorly written, confusingly presented, and lacking in rigor.

First, we do not need any specific distributional assumptions to motivate the maximization of the log-likelihood. Rather, we can appeal to more general regularity requirements. To keep the discussion simple, if $f : S \to \mathbb R^+$ is a differentiable function with support $S \subseteq \mathbb R$, then the critical points of $f$ satisfying $f'(x) = 0$ are also critical points of $g = \log f$, since $$g'(x) = \frac{f'(x)}{f(x)},$$ and whenever $f > 0$ (which is the case on the support), the RHS is well-defined.

The text in the red box really makes no sense. It reads like someone is just regurgitating something they heard, with no justification. The only way I can think of interpreting the statement that a Gaussian density is "monotone" in the variance, for instance, is that the likelihood $$\mathcal L(\sigma^2 \mid \tilde y, \mu) = \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\sum (y_i - \mu)^2/(2\sigma^2)}$$ has a nonnegative derivative with respect to $\sigma^2$; i.e., $\mathcal L(\sigma_1^2) \le \mathcal L(\sigma_2^2)$ for all $0 < \sigma_1^2 \le \sigma_2^2$. And this is simply false: treating $\tilde y, \mu$ as constants and letting $v = \sigma^2$, then the function $$h(v) = e^{-C/v^2}/v$$ has derivative $$h'(v) = \frac{2C-v}{2v^{5/2}} e^{-C/t}$$ which has an obvious critical point at $v = 2C$, and $h'(v) > 0$ for $0 < v < 2C$, and $h'(v) < 0$ for $v > 2C$, where $C$ is some function of $\tilde y$ and $\mu$ that is constant with respect to $v$. Even if the claim in the red box were true, it is not at all related to the aforementioned justification that we can take the logarithm of the likelihood to find local maxima.

Finally, I would also like to point out that

Given a measurement $\tilde{\bf y}$, the maximum likelihood estimate $\hat {\bf x}$ is the value of $\bf x$ which maximizes $p(\tilde{\bf y} \mid \bf x)$, which is the likelihood that $\bf x$ resulted in the measured $\tilde{\bf y}$

is very sloppy. The likelihood function is generally written with the unknown parameters first, then the vertical line, then followed by the data and any known parameters, as I illustrated above. This is because the symbol $\mid$ represents a conditional statement in which quantities preceding it are conditioned upon quantities following it. Writing $p(\tilde{\bf y} \mid \bf x)$ means that $\bf x$, the parameters, are known or given, hence $p$ is a (probability) function of the data $\tilde{\bf y}$ given those parameters. The same mistake is repeated in Equation $(2.135)$. Moreover, this equation is incorrect if the data have a nontrivial correlation structure--that is to say, the $y_i$ are not independent.

I believe the above should provide sufficient rationale as to why I claim the excerpt is poorly written.

Understanding 'Maximum Likelihood Estimation' Foundations

There are 1 best solutions below

Related Questions in PROBABILITY-THEORY

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in NORMAL-DISTRIBUTION

Related Questions in ESTIMATION

Related Questions in MAXIMUM-LIKELIHOOD

Trending Questions

Popular # Hahtags

Popular Questions