Is it a problem that we have to choose reference measure in Likelihood function?

Question

Is it a problem that we have to choose reference measure in Likelihood function?

42 Views Asked by Bumbble Comm At 24 Feb 2026 - 1:24

Suppose we have some parametric model $\{P_\theta \ \colon \theta \in \Theta\}$ and a sample $X$.

If we suppose as usual in classical statistics that $P_\theta << \lambda$ for all $\theta \in \Theta$, where $\lambda$ is the Lebesgue measure, then we define the likelihood function $L$ as $$ L(\theta,X) = f_\theta(X) \quad \text{for all } \theta \in \Theta, $$ where $f_\theta$ is the density of $P_\theta$ with respect to $\lambda$.

However, it seems if we instead assume all measures in $\{P_\theta \ \colon \theta \in \Theta\}$ are absolutely continuous with respect to each other, we define the likelihood function as $$ L(\theta, \theta_1,X) = \frac{dP_\theta}{dP_{\theta_1}}(X) \quad \text{for all } \theta \in \Theta, $$ where $\theta_1$ is a fixed, chosen element in $\Theta$.

From a statistical point of view, how do we know which $\theta_1$ to choose as the "reference" parameter? I would think that in general different choices of $\theta_1$ would lead to different maximum likehood estimates of the true $\theta_0$.

Is this not a problem?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Different $\theta_1$ cannot lead to different maximisers of the likelihood. So, from a statistical point of view, it doesn't matter which $\theta_1$ is chosen. Typically this is selected so that we have some convenience in the resulting likelihood expressions.

To show the invariance of maximisers w.r.t. $\theta_1,$ we can use the following chain-rule type relation for Radon-Nikodym derivatives (assuming $P \ll R \ll Q$): $$ \frac{\mathrm{dP}}{\mathrm{d}Q} = \frac{\mathrm{dP}}{\mathrm{d}R}\cdot \frac{\mathrm{dR}}{\mathrm{d}Q}.$$ As an exercise, try to show this.

This means that if I defined likelihoods with respect to $\theta_2$ instead of $\theta_1$, then this is equivalent to scaling all likelihoods with a (data dependent) positive constant, i.e. $$ L(\theta, \theta_2, X) = L(\theta, \theta_1, X) \cdot \frac{\mathrm{d}P_{\theta_1} }{\mathrm{d}P_{\theta_2}}(X). $$

Since this constant doesn't depend on $\theta$, any $\theta$ that maximises the LHS must also maximise the RHS.

Is it a problem that we have to choose reference measure in Likelihood function?

There are 1 best solutions below

Related Questions in PROBABILITY-THEORY

Related Questions in MAXIMUM-LIKELIHOOD

Related Questions in RADON-NIKODYM

Trending Questions

Popular # Hahtags

Popular Questions

Is it a problem that we have to *choose* reference measure in Likelihood function?

There are 1 best solutions below

Related Questions in PROBABILITY-THEORY

Related Questions in MAXIMUM-LIKELIHOOD

Related Questions in RADON-NIKODYM

Trending Questions

Popular # Hahtags

Popular Questions

Is it a problem that we have to choose reference measure in Likelihood function?