Lipschitz constant of negative log likelihood function

1.7k Views Asked by At

Show that $∇l(\mathbf{w})$ is Lipschitz continuous.

I found that $∇l(\mathbf{w})$ is:

enter image description here

So from the hint, Lipschitz constant is $\frac{1}{2}\frac{y_i}{y_i}||xi||$. I am not sure how to get the Frobenius norm. I think I am doing something wrong.

1

There are 1 best solutions below

0
On BEST ANSWER

For $l\colon\mathbb{R}^p \to\mathbb{R}$ defined by $l(w) = \sum_{i=1}^n \log(1+e^{-y_i w^\top x_i})$, we have that \begin{equation*} \nabla l(w) = -\sum_{i=1}^n \frac{1}{1+e^{-y_i w^\top x_i}} e^{-y_i w^\top x_i} y_i x_i = -\sum_{i=1}^n \frac{1}{e^{y_i w^\top x_i} + 1} y_i x_i. \end{equation*} Denoting the vector $2$-norm by $\|\cdot\|$, we have that \begin{align*} \|\nabla l(v) - \nabla l(w)\| ={}& \left\lVert \sum_{i=1}^n \left( \frac{1}{e^{y_i w^\top x_i}+1} y_i x_i - \frac{1}{e^{y_i v^\top x_i} + 1} y_i x_i \right) \right\rVert \\ \le{}& \sum_{i=1}^n \left|\frac{1}{e^{y_i w^\top x_i} + 1} - \frac{1}{e^{y_i v^\top x_i} + 1} \right||y_i|\|x_i\| \\ \le{}& \sum_{i=1}^n \left(\frac{1}{2}|y_i|\|x_i\| \|v-w\| \right)|y_i| \|x_i\| \\ ={}& \left(\frac{1}{2} \sum_{i=1}^n |y_i|^2 \|x_i\|^2 \right) \|v-w\|, \end{align*} where the first inequality comes from the triangle inequality and the second one comes from the Lipschitz continuity hint you were given. Therefore, $\nabla l$ is Lipschitz continuous with Lipschitz constant \begin{equation*} L = \frac{1}{2} \sum_{i=1}^n |y_i|^2 \|x_i\|^2. \end{equation*} In the case that $y_i\in\{-1,1\}$ (which is commonly the case in many machine learning contexts), this reduces to \begin{equation*} L = \frac{1}{2}\sum_{i=1}^n \|x_i\|^2 = \frac{1}{2}\sum_{i=1}^n\sum_{j=1}^p (x_i)_j^2 = \frac{1}{2}\sum_{i=1}^n\sum_{j=1}^p X_{ij}^2 = \frac{1}{2}\|X\|_F^2, \end{equation*} where \begin{equation*} X = \begin{bmatrix} x_1^\top \\ x_2^\top \\ \vdots \\ x_n^\top \end{bmatrix}, \end{equation*} and $\|\cdot\|_F$ denotes the Frobenius norm.