In his free online book, "Neural Networks and Deep Learning", Michael Nielsen proposes to prove the next result:
If $C$ is a cost function which depends on $v_{1}, v_{2}, ..., v_{n}$, he states that we make a move in the $\Delta v$ direction to decrease $C$ as much as possible, and that's equivalent to minimizing $\Delta C \approx \nabla C \cdot \Delta v$. So if $\lvert\lvert\Delta v\rvert\rvert = \epsilon$ for a small $\epsilon$, it can be proved that the choice of $\Delta v$ that minimizes $\Delta C \approx \nabla C \cdot \Delta v$ is $\Delta v = -\eta \nabla C$ where $\eta = \epsilon / \lvert\lvert \nabla C \rvert\rvert$. And he suggests using the Cauchy-Schwarz inequality to prove this.
Ok, so what I've done is to minimize with respect to $\Delta v$ an equivalent function $0 = min_{\Delta v} \lvert\lvert \nabla C \Delta v \rvert\rvert^{2} \leq min_{\Delta v}\lvert\lvert \nabla C \rvert\rvert^{2}\lvert\lvert \Delta v\rvert\rvert^{2}$ (using C-S inequality). I would say this is the correct path to prove the result but I'm stuck and can't arrive to the same result.
Thanks.
suppose by absurd that
∇C ⋅ Δv < ∇C ⋅ ( -ϵ∇C / ||∇C|| )
note that ride side is <= 0 and so left side too
changing sign now we get
-∇C ⋅ Δv > ∇C ⋅ ( ϵ∇C / ||∇C|| )
note that now both right and left side are >= 0
applying modulo we get
| ∇C ⋅ Δv | > ∇C ⋅ ( ϵ∇C / ||∇C|| )
given that ||Δv|| = ϵ (by hipotesys) and ||a|| = a⋅a/||a|| we get
| ∇C ⋅ Δv | > || ∇C || * || Δv ||
this is absurd because by Cauchy–Schwarz inequality | ∇C ⋅ Δv | <= || ∇C || * || Δv ||