I am studying on linear programming and there is just something that I dont get. For the linear program
$ \min \ c^Tx \\ \text{s.t.} \ \ Ax \le b $
Its lagrange is defined as
$L(x, \lambda) = c^Tx + \lambda^T(Ax-b)$
The dual function is then defined as
$g(\lambda) = \inf_x ((c + A^T\lambda)^Tx - b^T\lambda)$
(Question) Why do we take the infimum of x ? Can someone show a graphical example ? I find that I am able to follow the derivation up till the point $\inf_x$ is written.
Basically, our original optimization problem is equivalent to the following problem,
$$\min_x \max_\lambda L(x,\lambda)$$
Making $\lambda$ really big enforces the constraint, in this case that $Ax - b\leq 0$. If $\lambda$ is huge then the minimum will have to be attained when the constraint is satisfied. At this point, our problem is still just as hard to solve as the original one we started with but now it is in a form we can manipulate. With some mild assumptions, we can equivalently interchange the max over $\lambda$ and the min over $x$ and examine the following problem,
$$\max_\lambda \min_x L(x,\lambda)$$
Now, we call the inside minimization problem the dual function, $g$,
$$g(\lambda) = \min_x L(x,\lambda) = \inf_x c^T x + \lambda ^T (Ax-b)$$
So, the answer to your question of "Why do we take the infimum over $x$?" is that we do so because our original problem was to find the minimum over $x$ and we have interchanged the max and min. The usefulness of this is that once we have done this change we can usually rewrite $g$ involving the Fenchel conjugate of $f$ and get a new optimization problem that is potentially easier to solve. It also allows us to use so called primal/dual methods which alternate between the original (primal) problem and its dual problem.