Prove an inequality from a scalar product

44 Views Asked by At

This exercice is a proof of the convergence of the gradient descent towards a stationary point. Let $\mathbf{v}_1, \dots, \mathbf{v}_T$ be a sequence of gradients. We have $\mathbf{w}^{(1)} = \mathbf{0}$, $\mathbf{w}^*$ is the global minima, $\eta$ is the learning rate. We update our weights this way $$ \mathbf{w}^{(t+1)} = \mathbf{w}^{(t)} - \eta \cdot \mathbf{v}_t $$

In the end we want to prove the following inequality $$ \frac{\eta}{2}\sum_{t=1}^T \||\mathbf{v}_t\||^2 + \frac{ \||\mathbf{w}^*\||^2 }{2\eta} \geq \sum^T_t \langle \mathbf{w}^{(t)} - \mathbf{w}^*, \mathbf{v}_t \rangle $$

So i tried to start by using the Cauchy-Schwarz Inequality on the scalar product $$ \sum_{t=1}^T \langle \mathbf{w}^{(t)} - \mathbf{w}^*, \mathbf{v}_t \rangle \leq \sum_t^T \|| \mathbf{w}^{(t)} - \mathbf{w}^*\|| \|| \mathbf{v}_t \|| $$

Then I try to replace $\mathbf{w}^{(t)}$ with $\mathbf{w}^{(t+1)}$

$$ \sum_{t=1}^T \langle \mathbf{w}^{(t)} - \mathbf{w}^*, \mathbf{v}_t \rangle \leq \sum_t^T \|| \mathbf{w}^{(t+1)} + \eta \cdot \mathbf{v}_t - \mathbf{w}^*\|| \|| \mathbf{v}_t \|| $$

From this point I don't know how to continue. Please help me out with this question.

1

There are 1 best solutions below

0
On

This quantity already begs you to consider: $$(av_t+a^{-1}w^*,av_t+a^{-1}w^*) = a^2\|v_t\|^2+a^{-2}\|w^*\|^2+2(v_t,w^*)\geq 0,$$ where I simplified a lot of notations and let $a^2=\eta.$ (Also stop writing absolute values outside the norm - the norm is already non-negative!).

This gets you almost what you want, except the term $2(v_t,w_t)$ on the RHS. This is why the detail of this setup is needed most likely - the relation between the gradients $v_i$ and inputs $w_i.$

I will leave this answer as a hint, or if you edit your question with more details, I can later fill in the details.