Dual norm of a truncated and ordered (decreasing order) $\ell_1$-norm

523 Views Asked by At

I do not understand yet how the following dual-norm of a truncated and ordered (in decreasing fashion) $\ell_1$-norm $\lVert \mathbf{x}\rVert_{[k]}$ on $\mathbf{x} \in \mathbb{C}^n$ is:

$$\lVert \mathbf{x}\rVert^*_{[k]}= \max\left\{\frac{1}{k} \| \mathbf{x} \|_1,\lVert \mathbf{x}\rVert_{\infty}\right\}$$

The truncated $\ell_1$-norm is defined as the sum of the $k$ largest magnitudes of the entries in $\mathbf{x}$ vector, i.e., $\lVert \mathbf{x}\rVert_{[k]}=\lvert x_{i_1}\rvert+.....+\lvert x_{i_k}\rvert$ in which $\lvert x_{i_1}\rvert\geq\lvert x_{i_2}\rvert\geq.....\geq\lvert x_{i_n}\rvert$.

Thank you

3

There are 3 best solutions below

1
On BEST ANSWER

The shortest proof I can think of is the following:

  1. Prove that $\|\cdot\|_{[k]}$ and $\|\cdot\|_{[k]}^*$ are norms (easy be definition).
  2. Use the LP problem suggested by Michael Grant that $$ \|x\|_{[k]}=\max\{x^Tz\colon \|z\|_1\le k,\,\|z\|_\infty\le 1\} $$ and since $\|z\|_1\le k$ $\Leftrightarrow$ $\frac{1}{k}\|z\|_1\le 1$ rewrite it as $$ \|x\|_{[k]}=\max\{x^Tz\colon \|z\|_{[k]}^*\le 1\}.\tag{1} $$
  3. The relation (1) means by definition that $\|\cdot\|_{[k]}$ is the dual of $\|\cdot\|_{[k]}^*$. Taking dual once again: the dual of $\|\cdot\|_{[k]}$ is the second dual of $\|\cdot\|_{[k]}^*$, which is the same as $\|\cdot\|_{[k]}^*$.

P.S. The last fact "the second dual norm is the norm itself" is a known result in finite dimensional spaces (even in reflexive Banach spaces). A proof normally uses separation theorems of convex sets (alternatively Hahn-Banach theorem). It can be found in e.g. Horn, Johnson, Matrix Analysis, Ch. 5, Sec. 5.5.

17
On

I accidentally derived the dual norm of your dual norm. Let us first derive some conjugate functions: $$ \begin{align} f(x) &= \max\left\{g(x),h(x)\right\} \\ g(x) &= \frac{1}{k} \| x \|_1 \\ h(x) &= \lVert x\rVert_{\infty} \\ g^*(y) &= 0 \text{ if } k \| y \|_\infty \leq 1 \;(\infty \text{ otherwise}) \\ h^*(y) &= 0 \text{ if } \| y \|_1 \leq 1 \;(\infty \text{ otherwise}) \\ \end{align} $$ Using the well-known rule for the conjugate of $\max\{g,h\}$, we get: $$\begin{align} f^*(y) &= \inf_{v,z} \{ (z_1g)^*(v) + (z_2h)^*(y-v) \mid z_1+z_2 = 1, z\geq 0 \}\\ &= 0 \text{ if } \exists v,z : k \| v \|_\infty \leq z_1, \; \| y-v \|_1 \leq z_2, \; z_1+z_2 = 1, z\geq 0 \\ &= 0 \text{ if } \exists v : k \| v \|_\infty +\| y-v \|_1 \leq 1 \;(\infty \text{ otherwise}) \end{align}$$ Using that the convex conjugate of a norm is the indicator of the unit ball for the dual norm, the dual norm of the dual norm is $$||y||^{**} = \inf_v \{ k \| v \|_\infty +\| (y-v) \|_1 \}.$$ This does not equal the norm you started with. Either the truncated L1-norm is not a real norm, or I made a mistake. Maybe you could do the same analysis for the norm itself, or check the above for a possible mistake.

9
On

Based on the input from @LinAlg and MichaelGrant--to be reviewed.

The function $f(\mathbf{x}) = \lVert \mathbf{x} \rVert_{[k]}$ is equal to the optimal value of the following linear program (LP): \begin{align*} \text{maximize}_\mathbf{v} \quad &\mathbf{v}^{\rm T} \mathbf{x}\\ \text{subject to }\quad & \lVert \mathbf{v} \rVert_1 \leq k \\ & \lVert \mathbf{v} \rVert_\infty \leq 1 \\ \equiv \\ \text{minimize} \ \ \quad &-\mathbf{v}^{\rm T} \mathbf{x}\\ \text{subject to }\quad & \mathbf{1}^{\rm T} \mathbf{t} \leq k \ ; - \mathbf{t} \preceq \mathbf{v} \preceq \mathbf{t} \\ & - \mathbf{1} \preceq \mathbf{v} \preceq \mathbf{1} \\ \equiv \\ \text{minimize} \ \ \quad &-\mathbf{v}^{\rm T} \mathbf{x}\\ \text{subject to }\quad & \mathbf{1}^{\rm T} \mathbf{t} \leq k \\ & - \mathbf{t} \preceq \mathbf{v} \preceq \mathbf{t}\\ & \mathbf{t} \preceq \mathbf{1} \\ \equiv \\ \text{minimize} \ \ \quad &-\mathbf{v}^{\rm T} \mathbf{x} &\\ \text{subject to }\quad & \mathbf{1}^{\rm T} \mathbf{t} - k \leq 0 \\ & -\mathbf{v} - \mathbf{t} \preceq 0 \\ & \mathbf{v} - \mathbf{t} \preceq 0 \\ & \mathbf{t} - \mathbf{1} \preceq 0 \end{align*}

Then, forming the Lagrangian such that \begin{align*} L\left(\mathbf{v},\mathbf{t},\rho,\boldsymbol{\mu}_1,\boldsymbol{\mu}_2,\boldsymbol{\lambda} \right) &= -\mathbf{v}^{\rm T} \mathbf{x} +\rho \left( \mathbf{1}^{\rm T} \mathbf{t} - k \right) + \boldsymbol{\mu}_1^{\rm T} \left( -\mathbf{v} - \mathbf{t} \right) + \boldsymbol{\mu}_2^{\rm T} \left( \mathbf{v} - \mathbf{t} \right) + \boldsymbol{\lambda}^{\rm T}\left( \mathbf{t} - \mathbf{1} \right) \\ &= -\mathbf{v}^{\rm T} \mathbf{x} +\rho \left( \mathbf{1}^{\rm T} \mathbf{t} - k \right) - \mathbf{v}^{\rm T} \boldsymbol{\mu}_1 - \mathbf{t}^{\rm T} \boldsymbol{\mu}_1 + \mathbf{v}^{\rm T} \boldsymbol{\mu}_2 - \mathbf{t}^{\rm T} \boldsymbol{\mu}_2 + \mathbf{t}^{\rm T} \boldsymbol{\lambda} - \mathbf{1}^{\rm T} \boldsymbol{\lambda} \\ &= \mathbf{v}^{\rm T} \left( -\mathbf{x} -\boldsymbol{\mu}_1 + \boldsymbol{\mu}_2 \right) + \mathbf{t}^{\rm T} \left( \rho \mathbf{1} - \boldsymbol{\mu}_1 - \boldsymbol{\mu}_2 + \boldsymbol{\lambda} \right) + \left( - \rho k - \mathbf{1}^{\rm T} \boldsymbol{\lambda} \right) . \end{align*}

Now, taking the derivative of $L\left( \mathbf{v},\mathbf{t},\rho,\boldsymbol{\mu}_1,\boldsymbol{\mu}_2,\boldsymbol{\lambda}\right)$ with respect to $\mathbf{v}$ and $\mathbf{t}$ such that the dual function reads \begin{align*} g\left(\rho,\boldsymbol{\mu}_1,\boldsymbol{\mu}_2,\boldsymbol{\lambda}\right) &= \left\{ \begin{matrix} \left( - \rho k - \mathbf{1}^{\rm T} \boldsymbol{\lambda} \right) & & \left( -\mathbf{x} -\boldsymbol{\mu}_1 + \boldsymbol{\mu}_2 \right) = \mathbf{0} , \quad \rho \mathbf{1} - \boldsymbol{\mu}_1 - \boldsymbol{\mu}_2 + \boldsymbol{\lambda} = \mathbf{0} \\ -\infty & &\text{otherwise}. \end{matrix} \right. \end{align*}

The dual optimization problem would be \begin{align*} \text{maximize} \quad \ & - \rho k - \mathbf{1}^{\rm T} \boldsymbol{\lambda} \\ \text{subject to }\quad & -\boldsymbol{\mu}_1 + \boldsymbol{\mu}_2 = \mathbf{x} \\ & \boldsymbol{\mu}_1 + \boldsymbol{\mu}_2 = \rho \mathbf{1} + \boldsymbol{\lambda} \\ & \rho \geq 0 \\ & \boldsymbol{\mu}_1 \succeq 0 \ ; \ \boldsymbol{\mu}_2 \succeq 0\\ & \boldsymbol{\lambda} \succeq 0 . \\ & \equiv\\ \\ \text{minimize} \quad \ & \rho k + \mathbf{1}^{\rm T} \boldsymbol{\lambda} \\ \text{subject to }\quad & -\boldsymbol{\mu}_1 + \boldsymbol{\mu}_2 = \mathbf{x} \\ & \boldsymbol{\mu}_1 + \boldsymbol{\mu}_2 = \rho \mathbf{1} + \boldsymbol{\lambda} \\ & \rho \geq 0 \\ & \boldsymbol{\mu}_1 \succeq 0 \ ; \ \boldsymbol{\mu}_2 \succeq 0\\ & \boldsymbol{\lambda} \succeq 0 . \end{align*}

At the optimality, $\boldsymbol{\mu}_1 = \max \left\{0, -\mathbf{x} \right\}$ and $\boldsymbol{\mu}_2 = \max \left\{0, \mathbf{x} \right\}$ such that $| \mathbf{x}| = \rho \mathbf{1} + \boldsymbol{\lambda}$. Thus, the above optimization problem can succinctly be written as $$\min_{\rho \geq 0,\: \boldsymbol{\lambda} \succeq 0} \left\{ \rho k + \mathbf{1}^{\rm T} \boldsymbol{\lambda} \ : \ \left|\mathbf{x}\right| = \rho \mathbf{1} + \boldsymbol{\lambda} \right\}.$$

[to be verified] To connect it with the dual function $f^*(\mathbf{x}) = \lVert \mathbf{x} \rVert_{[k]}^*$, we need to verify numerically.