Is it possible to get a solution for all inequality constraint values in Lagrange multipliers?

98 Views Asked by At

I am reading a paper on the Information Bottleneck Method. In this paper authors give a short review of rate distortion theory. In rate distortion theory we are interested in the rate distortion function $R(D)$, which is defined as $$R(D) = \min_{p(\tilde x | x)} I(X, \tilde{X}) \qquad\text{such that}\quad \mathbb{E}_{p(x,\tilde x)}[d(x, \tilde x)] \leq D$$ where:

  • $X$ is our source random variable. It is discrete and finite.
  • $\tilde X$ is quantization of $X$ (this quantization is stochastic, i.e. we have a stochastic mapping $p(\tilde x | x)$)
  • $d(x, \tilde x)$ is our distortion function, which measures the quality of quantization (in practice, it can be something like MSE error or Hamming distance)
  • $\beta$ is the Lagrange multiplier

Authors say, that we do it by minimizing a Lagrangian:

$$\mathcal{F}(p(\tilde x | x)) = I(X, \tilde{X}) + \beta \mathbb{E}_{p(x,\tilde x)}[d(x, \tilde x)]$$

There are two moments that I do not understand:

  • Why second term in Lagrangian does not contain $D$? Wikipedia says, that inequality constraints "$g(x) \leq C$" should give second term of the kind "$\beta(g(x) - C)$", not "$\beta g(x)$". We somehow imply that $C = 0$ here?

  • Why do authors say, that we find a whole function $R(D)$? It looks like we are finding just a single value of this function, evaluated at the point $D$, not the whole form of it.

1

There are 1 best solutions below

6
On BEST ANSWER

This is quite common in this kind of max/minimization problems.

In general, supose you want to find an extremum of $f(x)$ subject to $g(x)\le D$. The method of Lagrange multiplier does not adapt to this constraint. But, if if you have reasons to assume (we often have) that the constraint $g(x)\le D$ is "relevant / pessimistic", that is, that we wish $D$ to be bigger to get better extrema of $f(x)$, this implies that the extremum of $f(x)$ subject to $g(x)\le D$ will occur in the limit of the region, i.e., where $g(x)=D$ or $g(x)-D=0$. Then, we are justified in replacing the inequality by an equality.

Then, Lagrange multiplier tells us to find the extremum of the Lagrangian

$$J_1(x) = f(x) + \beta (g(x)-D) \tag1$$ together with $g(x)-D=0$. But, because $D$ is constant, this is equivalent to find the extremum of the alternative Lagrangian

$$ J_2(x) =f(x) + \beta' g(x) \tag2$$

But to find the extremum of $(2)$ is not the same as assuming $D=0$, because the additional equation $g(x)-D=0$ remains as is. Different values of $D$ will give different multipliers ($\beta \ne \beta'$) and hence different extrema.

Namely, if the extrema corresponds to a critical point, both $(1)$ and $(2)$ will lead to the same system of equations:

\begin{cases} f'(x)+ \beta g'(x)&=0\\ g(x) -D &= 0 \tag3 \end{cases}

The solution, of course, depends on $D$.