Tricky proof of a result of Michael Nielsen's book "Neural Networks and Deep Learning".

4.9k Views Asked by At

In his free online book, "Neural Networks and Deep Learning", Michael Nielsen proposes to prove the next result:

If $C$ is a cost function which depends on $v_{1}, v_{2}, ..., v_{n}$, he states that we make a move in the $\Delta v$ direction to decrease $C$ as much as possible, and that's equivalent to minimizing $\Delta C \approx \nabla C \cdot \Delta v$. So if $\lvert\lvert\Delta v\rvert\rvert = \epsilon$ for a small $\epsilon$, it can be proved that the choice of $\Delta v$ that minimizes $\Delta C \approx \nabla C \cdot \Delta v$ is $\Delta v = -\eta \nabla C$ where $\eta = \epsilon / \lvert\lvert \nabla C \rvert\rvert$. And he suggests using the Cauchy-Schwarz inequality to prove this.

Ok, so what I've done is to minimize with respect to $\Delta v$ an equivalent function $0 = min_{\Delta v} \lvert\lvert \nabla C \Delta v \rvert\rvert^{2} \leq min_{\Delta v}\lvert\lvert \nabla C \rvert\rvert^{2}\lvert\lvert \Delta v\rvert\rvert^{2}$ (using C-S inequality). I would say this is the correct path to prove the result but I'm stuck and can't arrive to the same result.

Thanks.

1

There are 1 best solutions below

0
On

suppose by absurd that

∇C ⋅ Δv < ∇C ⋅ ( -ϵ∇C / ||∇C|| )

note that ride side is <= 0 and so left side too

changing sign now we get

-∇C ⋅ Δv > ∇C ⋅ ( ϵ∇C / ||∇C|| )

note that now both right and left side are >= 0

applying modulo we get

| ∇C ⋅ Δv | > ∇C ⋅ ( ϵ∇C / ||∇C|| )

given that ||Δv|| = ϵ (by hipotesys) and ||a|| = a⋅a/||a|| we get

| ∇C ⋅ Δv | > || ∇C || * || Δv ||

this is absurd because by Cauchy–Schwarz inequality | ∇C ⋅ Δv | <= || ∇C || * || Δv ||