Suppose that $X_1, X_2, \dots, X_n$ form an iid sample drawn from some probability distribution with unknown density $g$. In Theorem 3.1 in [1] it is shown that minimization of $$A_0(g) = -n^{-1}\sum_{i=1}^ng(X_i)$$ subject to $$\int\exp(g(u))\,\mathrm du = 1$$ is equivalent to minimization of $$A(g) = -n^{-1}\sum_{i=1}^ng(X_i) + \int\exp(g(u))\,\mathrm du.$$ Both objective functions are minimized with respect to $g$ over a suitable class of functions $\mathcal S$.
To prove the claim, an auxiliary function $g^* = g - \log\left(\int\exp(g(u))\,\mathrm du\right)$ is constructed. Then, since $$A(g^*) = A(g) + 1 - \int\exp(g(u))\,\mathrm du+ \log\left(\int\exp(g(u))\,\mathrm du\right),$$ it follows that $A(g^*)\leq A(g)$ by the the inequality $x\leq\exp(x)$. The proof then concludes that if $\hat g$ minimizes $A(g)$ it necessarily has to satisfy $$\int\exp(g(u))\,\mathrm du = 1,$$ which makes both objective functions equivalent.
I don't understand the last step. Why does it suffice to consider this auxiliary function in order to prove the claim? I tried to use a Lagrangian approach for the first objective and took $\mathcal S$ to be the set of all polynomials. Both objectives led to different results. So is this theorem even true?
$\hat{g}$ must satisfy $A((\hat{g})^*)=A(\hat{g})$ because we know $A((\hat{g})^*) \le A(\hat{g})$ and because we know $\hat{g}$ minimizes $A$.
If you know $A$ has a unique minimizer, then you can conclude $(\hat{g})^*=\hat{g}$, in which case $\int e^{\hat{g}(u)} \, du = 1$.
Otherwise, the best you can say is "there exists a minimizer of $A$, call it $\tilde{g}$, that satisfies $\int e^{\tilde{g}(u)} \, du = 1$," which you can obtain by taking any minimizer $\hat{g}$ of $A$ and considering $\tilde{g} := (\hat{g})^*$.