Question: I want to do a Bayesian inference on a constrained variable. I want to use change of variables theorem to remove the constraint, before doing the Bayesian inference. How does the prior on the variable should change?
Context: Say, I have a likelihood function, $P(y|\theta_a, \theta_b)$. Further suppose that the parameter $\theta_a$ is unconstrained, yet the parameter $\theta_b$ is constrained such that $\theta_b \geq 0$ (non-negativity constraint).
MLE solution is not very attractive, so I decided to use MAP estimate. To do so, we can use Bayes' rule as follows:
$P(\theta_a, \theta_b|y) \propto P(y|\theta_a, \theta_b)P(\theta_a)P(\theta_b)$
Here, I am assuming $P(\theta_a, \theta_b) = P(\theta_a)P(\theta_b)$, independence between the variables.
Assuming we have a specific prior distribution for $\theta_a$ (not important for this question), but we want to use a uninformative uniform prior on $\theta_b$, such that $p(\theta_b) \propto 1$. Then the posterior reduces to
$P(\theta_a, \theta_b|y) \propto P(y|\theta_a, \theta_b)P(\theta_a)$
Then the MAP solution can be obtained by solving the following optimization problem.
$\hat{\theta}_a, \hat{\theta}_b = argmax_{\theta_a, \theta_b} L(\theta_a, \theta_b|y) + L(\theta_a)$ subject to $\theta_b \geq 0$
Where $L(\theta_a, \theta_b|y), L(\theta_a)$ are log-likelihoods.
Now, I want to solve the above constrained optimization problem with an unconstrained optimization algorithm. So, I can change the variable, such that $\theta_b = e^{\theta'_b}$. with this change of variable, we solve a different unconstrained optimization problem,
$\hat{\theta}_a, \hat{\theta}'_b = argmax_{\theta_a, \theta'_b} L(\theta_a, e^{\theta'_b}|y) + L(\theta_a)$
The optimum $\hat{\theta}_b$ can be recovered by $\hat{\theta}_b = e^{\hat{\theta}_b'}$
The question is, don't we have to change the variable for the $p(\theta_b)$? Change of variable theorem will yield a different prior for $\theta'_b$, like $p_{\theta'_b}(\theta'_b) = p_{\theta_b}(e^{\theta'_b})\lvert \frac{d\theta_b}{d\theta'_b} \rvert \propto e^{\theta'_b}$? Then adding the log-likelihood to the above unconstrained optimization problem will yield a different solution.
I cannot seem to find what is wrong with these two trains of thoughts: one without changing the variable for $p(\theta_b)$ and one with the change of variable for $p(\theta_b)$. They'll clearly yield different solutions. Why?
Yes, they yield different solutions, as they should – they solve two different problems. Since the MAP maximizes the posterior probability density for the parameters, it depends on how you represent the parameters. If you transform to a different set of parameters, you change the density, and thus the MAP. This is a downside of MAP compared to MLE, which doesn’t change when you change the representation of the parameters. (The MLE is the MAP that you get when you represent the parameter by its prior CDF, which yields a uniform prior.)
In the first case, you’re merely applying a transformation to make a calculation more tractable, but you’re still solving the same problem and will obtain the same solution. In the second case, you’re considering a different problem, that of maximizing the posterior probability density for $\theta_a$ and $\theta_b'$.