Fenchel conjugate of $\| \cdot \|_1$ and dual of logistic regression

379 Views Asked by At

I am trying to replicate some results from

Koh, K., Kim, S. J., & Boyd, S. (2007). An interior-point method for large-scale l1-regularized logistic regression. Journal of Machine learning research, 8(Jul), 1519-1555.

The rewrite the problem of optimizing logistic regression with $\ell 1$ penalty as convex optimization problem with equality constraints.

What I am interested in is the dual function of this problem:

\begin{align} \inf_{v,w,z} L(v,w,z,\theta) &= (1/m)\sum_{i=1}^{m}f(z_i-m\theta_iz_i)+\inf_{w}(\lambda ||w||_1+ \theta^{T} Aw) + \inf_{v} \theta^{T} b v \nonumber \\ & = \begin{cases} -(1/m)\sum_{i=1}^{m}f^{*}(-m\theta_i) &\text{ } ||A^{T}\theta||_{\infty} \leq \lambda \text{, } b^{T}\theta = 0\\ -\infty &\text{ otherwise }, \end{cases} \end{align}

where w is the weight vector, v is the intercept vector, $\theta$ are the lagrange multipliers, A the data matrix and b is the vector of binary outcomes.

Now, $f^{*}$ is the conjugate of the logistic loss function:

\begin{equation} f^{*}(y)= \sup_u(y^{T}u-f_(u))= \begin{cases} (-y)\log(-y)+(1+y)\log(1+y) &\text{ if } -1<y<0\\ 0 &\text{ if } y = -1 \text{ or } y = 0\\ \infty &\text{ ontherwise }. \end{cases} \end{equation}

Specifically the last two terms of the dual are of interest:

\begin{equation} \inf_{w}(\lambda ||w||_1+ \theta^{T} Aw) \end{equation}

I understand that this term can be restated as the conjugate of the $||.||_1$ norm (Fenchel conjugate of a norm), which is the $||.||_{\infty}$ norm, but why is this a constraint on the dual function?

Also I dont understand how the last term of the dual $\inf_{v} \theta^{T} b v$ results in the solution $b^{T}\theta = 0$

1

There are 1 best solutions below

0
On

Regarding your question on $\inf_v \theta^Tbv$: Consider the case where $\theta^Tb= \alpha\neq 0$. Then since $v$ is defined over all $\mathbb{R}$ we can set it arbitrary large, and just keep the sign opposite to $\alpha$. This will mean the Lagrangian is unbounded. For that reason we define the feasible set of $L$ to be $\theta^Tb=b^T\theta=0$, and the the actual value of $\inf_v \theta^Tbv = 0$.