Suppose we have some parametric model $\{P_\theta \ \colon \theta \in \Theta\}$ and a sample $X$.
If we suppose as usual in classical statistics that $P_\theta << \lambda$ for all $\theta \in \Theta$, where $\lambda$ is the Lebesgue measure, then we define the likelihood function $L$ as $$ L(\theta,X) = f_\theta(X) \quad \text{for all } \theta \in \Theta, $$ where $f_\theta$ is the density of $P_\theta$ with respect to $\lambda$.
However, it seems if we instead assume all measures in $\{P_\theta \ \colon \theta \in \Theta\}$ are absolutely continuous with respect to each other, we define the likelihood function as $$ L(\theta, \theta_1,X) = \frac{dP_\theta}{dP_{\theta_1}}(X) \quad \text{for all } \theta \in \Theta, $$ where $\theta_1$ is a fixed, chosen element in $\Theta$.
From a statistical point of view, how do we know which $\theta_1$ to choose as the "reference" parameter? I would think that in general different choices of $\theta_1$ would lead to different maximum likehood estimates of the true $\theta_0$.
Is this not a problem?
Different $\theta_1$ cannot lead to different maximisers of the likelihood. So, from a statistical point of view, it doesn't matter which $\theta_1$ is chosen. Typically this is selected so that we have some convenience in the resulting likelihood expressions.
To show the invariance of maximisers w.r.t. $\theta_1,$ we can use the following chain-rule type relation for Radon-Nikodym derivatives (assuming $P \ll R \ll Q$): $$ \frac{\mathrm{dP}}{\mathrm{d}Q} = \frac{\mathrm{dP}}{\mathrm{d}R}\cdot \frac{\mathrm{dR}}{\mathrm{d}Q}.$$ As an exercise, try to show this.
This means that if I defined likelihoods with respect to $\theta_2$ instead of $\theta_1$, then this is equivalent to scaling all likelihoods with a (data dependent) positive constant, i.e. $$ L(\theta, \theta_2, X) = L(\theta, \theta_1, X) \cdot \frac{\mathrm{d}P_{\theta_1} }{\mathrm{d}P_{\theta_2}}(X). $$
Since this constant doesn't depend on $\theta$, any $\theta$ that maximises the LHS must also maximise the RHS.