The question is from the textbook Convex Optimization Algorithms, prof. Bertsekas, p.511
A special case of Fenchel duality is the following:
\begin{equation}
\begin{aligned}
&{\text{min}}
& & f(x)\\
& \text{s.t.} & & x \in \mathcal{C}\\
\end{aligned}
\end{equation}
where
1. $f: \mathbb{R^n} \mapsto (-\infty,\infty]$ is closed, proper, convex.
2. $\mathcal{C}$ is a closed, convex cone in $\mathbb{R^n}$
The dual problem is:
\begin{equation} \begin{aligned} &{\text{min}} & & f^*(\lambda)\\ & \text{s.t.} & & \lambda \in \mathcal{\hat C}\\ \end{aligned} \end{equation}
where
1. $f^*$ is the conjugate of $f$
2. $\mathcal{\hat C}$ is the dual cone
My question is: Why the dual problem is a conjugate function rather than the minimization of a Lagrange dual function?
We have the lagrangian $g(y)=\inf_{x\epsilon C} ( f(x))$. The dual cone is the set $C^*=\{y: y^Tx \geq 0 ,\forall x \epsilon C\}$. Then $g(y)=\inf_{x\epsilon C} ( f(x)+y^Tx)$ where $y\epsilon C^*$, since $y^Tx\geq 0$. This is equal to $g(y)=-\sup_{x\epsilon C} (-f(x)-y^Tx)=-\sup_{x\epsilon C} ((-y)^Tx-f(x))=-f^*(-y)$. The last equality comes from definition of conjugate function. So we have $\max_{y} g(y)=\min_{y \epsilon C^*} f^*(-y)$. You can see that lagrangian is completely related to conjugate function.