Why would renormalization still optimize the objective?

26 Views Asked by At

I want to maximize the following objective $$ \arg\max_\pi\mathbb E_{\pi(a)}[r(a)-\log\pi(a)]\tag1\\ s.t. \int\pi(a)da=1 $$ I've leaned a way to solve this problem according to Page 26 of this lecture(albeit I simplify the problem a bit):

Taking the gradient of (1), and setting it to zero, this gives us $$ \begin{align} r(a)-\log\pi(a)-1&=0\\ \pi(a)&=\exp(r(a)-1)\tag 2 \end{align} $$ Following the lecture, this means $\pi(a)\propto\exp(r(a))$. Because $\pi(a)$ is a probability distribution and integrates to 1, we renormalize Equation (2) and get $$ \pi(a)={\exp(r(a))\over\int\exp(r(a))da}\tag3 $$ I understand that Equation (2) is the minimizer of (1) without the constraint. What I'm confused about is why Equation (2) suggests $\pi(a)\propto\exp(r(a))$? Why renormalizing Equation (2) would makes it the minimizer of (1) under the constraint?

1

There are 1 best solutions below

0
On BEST ANSWER

Let's do some variational calculus here.

We want to solve constrained optimization problem:

$ \left \{ \begin{array}{rcl} \int\limits_{\mathbb{R}} \pi(a) \cdot (r(a) - \log \pi(a)) \ da \to \max\limits_{\pi(a)} \\ s.t. \int\limits_{\mathbb{R}} \pi(a) \ da = 1 \end{array} \right . $

Construct Lagrangian: $L(a, \pi, \pi^{\prime}) = \int\limits_{\mathbb{R}} \underbrace{\pi(a)(r(a) - \log \pi(a)) + \lambda \cdot (\pi(a) - 1)}_{F(a, \pi, \pi^{\prime})} \ da$.

It's known from variational calculus, that in order to minimize $L$, you need to solve $\frac{\delta L}{\delta \pi} = \frac{\partial F}{\partial \pi} - \frac{d}{da} \frac{\partial L}{\partial \pi^{\prime}} = 0$.

So we have: $\frac{\delta L}{\delta \pi} = r(a) - \log \pi(a) - 1 + \lambda = 0 \Rightarrow \pi(a) = e^{r(a)} \cdot e^{\lambda - 1}$

Plug this into constraint: $\int\limits_{\mathbb{R}} \pi(a) \ da = e^{\lambda - 1} \int\limits_{\mathbb{R}} e^{r(a)} \ da = 1 \Rightarrow e^{\lambda - 1} = \frac{1}{\int\limits_{\mathbb{R}} e^{r(a)} \ da}$

So: $\boxed{\pi(a) = \frac{e^{r(a)}}{\int\limits_{\mathbb{R}} e^{r(a)} \ da}}$