Proving: $\frac{1}{2m}\left|\left|\nabla f(x)\right|\right|_{2}^{2}\leq f(x)-f(x^{*})\leq\frac{1}{2M}\left|\left|\nabla f(x)\right|\right|_{2}^{2}$

602 Views Asked by At

let there be a strongly convex function $f(x)$.

I want to prove that if $\forall x\in Dom(f):mI\succcurlyeq\nabla^{2}f(x)\succcurlyeq MI$ then:

$\frac{1}{2m}\left|\left|\nabla f(x)\right|\right|_{2}^{2}\leq f(x)-f(x^{*})\leq\frac{1}{2M}\left|\left|\nabla f(x)\right|\right|_{2}^{2}$

NOTE: $x^{*}$ is the optimum point of $f(x)$ , in this case it's the minimum point, and "$\succcurlyeq$" is the Positive semi-definite Notation.

I was given a hint to use Taylor's Multivariate Theorem:

$\forall x,y\in\mathbb{R}^{n}$ $\exists$ $z\in[x,y]:f(y)=f(x)+\nabla f(x)^{T}(y-x)+\frac{1}{2}(y-x)^{T}\nabla^{2}f(z)(y-x)$

in general, I know that for A,B $\in\mathbb{R}^{nxn},$$A\succcurlyeq B$ means the following holds for any $x\in\mathbb{R}^{n}:$

$x^{T}Ax\geq x^{T}Bx$

so in our case for any $x\in Dom(f):$

$mI\succcurlyeq\nabla^{2}f(x)$ $\Leftrightarrow$ $x^{T}mIx\geq x^{T}\nabla^{2}f(x)$

$\nabla^{2}f(x)\succcurlyeq MI$ $\Leftrightarrow$ $x^{T}\nabla^{2}f(x)x\geq x^{T}MIx$

I just don't know how to use the last conclusion in order to solve the question.

I found a similar question here (not really the same, its with opposite inequality): Prove that $f(x) - f(x^{*}) \geq \frac{m}{2} \| x - x^{*} \|^{2}$.

but unfortunately, I didn't like solving this using integrals, I hope someone could help me solve this other variation of these questions using other way.

My progress so far for reaching the proof:

$mI\succcurlyeq\nabla^{2}f(z)$$\rightarrow\nabla^{2}f(z)-mI\preccurlyeq0$ which means that $\forall x,y,z:$

$(x-y)^{T}(\nabla^{2}f(z)-mI)(x-y)\leq0$$\rightarrow(x-y)^{T}(\nabla^{2}f(z))(x-y)-(x-y)^{T}(mI)(x-y)\leq0$

which leads to: $(x-y)^{T}(\nabla^{2}f(z))(x-y)\leq m(x-y)^{T}(I)(x-y)=m\Vert x-y\parallel^{2}$

if f(x) is convex then: $f(y)\geq f(x)$ $+$ $\nabla f(x)^{T}(y-x)$

$f(y)=f(x)+\nabla f(x)^{T}(y-x)+\frac{1}{2}(y-x)^{T}\nabla^{2}f(z)(y-x)\leq f(x)+\nabla f(x)^{T}(y-x)+\frac{1}{2}m\Vert x-y\parallel^{2}$

$f(y)\leq f(x)+\nabla f(x)^{T}(y-x)+\frac{1}{2}m\Vert x-y\parallel^{2}$

we plug $y$ and $x$ where:

$y=x^{*}$and $x=x^{*}+t\nabla f(x)$

$f(x^{*})<f(x)+\nabla f(x)^{T}(-t\nabla f(x))+\frac{1}{2}m\Vert-t\nabla f(x)\parallel^{2}=f(x)+$ $\left|\left|\nabla f(x)\right|\right|^{2}t(\frac{mt}{2}-1)$

$\Rightarrow f(x^{*})-f(x)\le\left|\left|\nabla f(x)\right|\right|^{2}t(\frac{mt}{2}-1)$

$\Rightarrow f(x)-f(x^{*})\geq\left|\left|\nabla f(x)\right|\right|^{2}t(1-\frac{mt}{2})\geq\left|\left|\nabla f(x)\right|\right|\frac{1}{2m}$

(where we choose some t that satisfies :$t(1-\frac{mt}{2})\geq\frac{1}{2m}$

My try to prove the right side of the equation: (Still working on it)

$\frac{1}{2m}\left|\left|\nabla f(x)\right|\right|_{2}^{2}\leq f(x)-f(x^{*})\leq\frac{1}{2M}\left|\left|\nabla f(x)\right|\right|_{2}^{2}$

taylor expansion:

$\forall x,y\in\mathbb{R}^{n}$ $\exists$ $z\in[x,y]:f(y)=f(x)+\nabla f(x)^{T}(y-x)+\frac{1}{2}(y-x)^{T}\nabla^{2}f(z)(y-x)$

if we choose: $y=x$ and $x=x^{*}:$

$f(x)=f(x^{*})$ $+$$(\underset{=0}{\underbrace{\nabla f(x^{*})}})^{T}(x-x^{*})+\frac{1}{2}(x-x^{*})^{T}\nabla^{2}f(z)(x-x^{*})$

$\rightarrow$$f(x)-f(x^{*})$$=\frac{1}{2}(x-x^{*})^{T}\nabla^{2}f(z)(x-x^{*})$

since $mI\succcurlyeq\nabla^{2}f(x)$ $\Leftrightarrow$ $x^{T}mIx\geq x^{T}\nabla^{2}f(x)x$:

especially for z: $z^{T}mIz\geq z^{T}\nabla^{2}f(z)z$$\rightarrow mI\geq\nabla^{2}f(z)$

$f(x)-f(x^{*})=\frac{1}{2}(x-x^{*})^{T}\nabla^{2}f(z)(x-x^{*})\leq\frac{1}{2}(x-x^{*})^{T}mI(x-x^{*})=\frac{m}{2}\left|\left|x-x^{*}\right|\right|_{2}^{2}$

$\nabla($$f(x)-f(x^{*}))=\nabla f(x)=\nabla(\frac{1}{2}(x-x^{*})^{T}\nabla^{2}f(z)(x-x^{*}))=\frac{1}{2}\left(\nabla^{2}f(z)(x-x^{*})+(x-x^{*})^{T}\nabla^{2}f(z)\right)=\nabla^{2}f(z)(x-x^{*})$

$\left(\nabla f(x)\right)^{T}\nabla f(x)=\left|\left|\nabla f(x)\right|\right|_{2}^{2}=\left(\nabla^{2}f(z)(x-x^{*})\right)^{T}\nabla^{2}f(z)(x-x^{*})=(x-x^{*})^{T}\left(\nabla^{2}f(z)\right)^{T}\nabla^{2}f(z)(x-x^{*})$

for every x : $x^{T}\nabla^{2}f(x)x\geq x^{T}MIx$ since: $\nabla^{2}f(x)\succcurlyeq MI$

especially for z: $z^{T}\nabla^{2}f(z)z\geq z^{T}MIz$$\rightarrow\nabla^{2}f(z)\geq$$MI$

$\left(\nabla^{2}f(z)\right)^{T}\nabla^{2}f(z){\geq\left(\nabla^{2}f(z)\right)^{T}MI}$

$\left|\left|\nabla f(x)\right|\right|_{2}^{2}=(x-x^{*})^{T}\left(\nabla^{2}f(z)\right)^{T}\nabla^{2}f(z)(x-x^{*})\geq(x-x^{*})^{T}\left(\nabla^{2}f(z)\right)^{T}MI(x-x^{*})=\left(\nabla^{2}f(z)\right)^{T}M\left|\left|x-x^{*}\right|\right|_{2}^{2}$ $\rightarrow$$\frac{\left|\left|\nabla f(x)\right|\right|_{2}^{2}}{\left(\nabla^{2}f(z)\right)^{T}M}\geq\left|\left|x-x^{*}\right|\right|_{2}^{2}$

we showed that: $f(x)-f(x^{*})=\frac{1}{2}(x-x^{*})^{T}\nabla^{2}f(z)(x-x^{*})\leq\frac{1}{2}(x-x^{*})^{T}mI(x-x^{*})=\frac{m}{2}\left|\left|x-x^{*}\right|\right|_{2}^{2}$

$\rightarrow f(x)-f(x^{*})\leq\frac{m}{2}\frac{\left|\left|\nabla f(x)\right|\right|_{2}^{2}}{\left(\nabla^{2}f(z)\right)^{T}M}$

I'm just confused with the last thing I did since I divided by a matrix. I'm not sure that If i'm in right direction, I would love to hear some comments.

1

There are 1 best solutions below

1
On BEST ANSWER

Partial answer (proof of the right hand side inequality):

A differentiable and $M$-strongly convex function (since $\nabla^2f(x) \geq MI$) is \begin{align} f(y) \geq f(x) + \nabla f(x)^T \left( y - x \right) + \frac{M}{2} \left\|y - x \right\|_2^2. \tag 1 \end{align}

Minimize both left hand sides of the inequalities above with respect to $y$ such that \begin{align} 0 = \nabla f(x) + M \left(y^\star - x \right) \Longleftrightarrow y^\star - x = - \frac{1}{\mu} \nabla f(x) .\tag 2 \end{align}

Now plugin this $(2)$ in the inequality $(1)$ such that \begin{align} &f(y) \geq f(x) + \nabla f(x)^T \left( -\frac{1}{M} \nabla f(x) \right) + \frac{M}{2} \left\|-\frac{1}{M} \nabla f(x)\right\|_2^2 = f(x) - \frac{1}{2M} \left\|\nabla f(x)\right\|_2^2 \\ \Longleftrightarrow & \frac{1}{2M} \left\|\nabla f(x)\right\|_2^2 \geq \left( f(x) - f(y) \right) \end{align}

The above inequality is valid for any $y$, which completes the desired proof.


Note: as TSF commented, the left hand side inequality is descent lemma (that is related to Lipschitz). A differentiable convex function with $m$-Lipschitz continuous gradient (since $\nabla^2f(x) \leq mI$) is \begin{align} f(y) \leq f(x) + \nabla f(x)^T \left( y - x \right) + \frac{m}{2} \left\|y - x \right\|_2^2. \tag 1 \end{align}

The above approach for the right hand side inequality can be employed to prove the left hand side inequality.