Will the duality gap be zero if the constraint is satisfied with equality?

25 Views Asked by At

Consider the following problem that aims to find an optimal policy $\pi$ mapping a state $s\in\mathcal{S}$ to an action $a\in\mathcal{A}$: \begin{array}{cl}\tag{1} \displaystyle \underset{\pi:\mathcal{S}\to\mathcal{A}}{\textrm{minimize}} & \displaystyle \lim_{T\to\infty} \frac{1}{T} \sum_{t=1}^{T} f(\pi;t)\\ \text{subject to} & \displaystyle \lim_{T\to\infty} \frac{1}{T} \sum_{t=1}^{T} g(\pi; t) \le 0, \end{array} where $f(\pi;t)$ denotes the cost at time $t$ under policy $\pi$, and $g(\pi;t)$ denotes any constraint-related value at time $t$ under policy $\pi$.

Then, consider the above problem's dual problem with Lagrangian variable $\lambda$: \begin{equation}\tag{2} \max_{\lambda\ge0} \min_{\pi:\mathcal{S}\to\mathcal{A}} \lim_{T\to\infty} \frac{1}{T} \sum_{t=1}^{T} \Big( f(\pi;t) + \lambda g(\pi;t) \Big). \end{equation}

It is well known that (2) has a lower bound of (1) in terms of the optimal value. That is, letting $y^*_1$ and $y^*_2$ be the optimal values of (1) and (2), respectively, we have $$y^*_1 \ge y^*_2.$$


By the way, in one proof written in a journal paper that I am reading now, they said as follows:

  1. It is already proved in another paper that, for any fixed $\lambda$, the optimal policy of the following problem is obtainable: $$\min_{\pi:\mathcal{S}\to\mathcal{A}} \lim_{T\to\infty} \frac{1}{T} \sum_{t=1}^{T} \Big( f(\pi;t) + \lambda g(\pi;t) \Big).$$ Let us denote the optimal policy of the above problem by $\pi^*(\lambda)$.

  2. Under this policy, assume that there exists $\lambda^*$ that makes the constraint hold with equality, i.e., $$\lim_{T\to\infty} \frac{1}{T} \sum_{t=1}^{T} g(\pi^*(\lambda^*); t) = 0.\tag{3}$$

  3. Then, $\pi^*(\lambda^*)$ will be the optimal solution to (1), i.e., there will be no duality gap.

  4. Then, to prove the zero duality gap between (1) and (2), the authors proceed with the case where there does not exists $\lambda^*$ satisfying (3).


I cannot understand how step (3) holds. Can someone let me understand it? Thank you!!