Michael Nielsen's book “Neural Networks and Deep Learning” Cauchy-Schwarz Inequality Proof

804 Views Asked by At

In the online free book the following is stated:

If $C$ is a cost function which depends on $v1,v2,...,vn$ he states that we make a move in the $Δv$ direction to decrease $C$ as much as possible, and that's equivalent to minimizing $ΔC≈∇C⋅Δv$. So if $∣∣Δv∣∣=ϵ$ for a small $ϵ$, it can be proved that the choice of $Δv$ that minimizes $ΔC≈∇C⋅Δv$ is $Δv=−η∇C$ where $η=ϵ/∣∣∇C∣∣$. It is suggested to use the Cauchy-Schwarz inequality.

I don't have a background in mathematics, I have done a lot of reading but I am struggling to know where to start. Even after a lot of reading, I have no conceptual understanding of why the Cauchy-Schwarz inequality is relevant here. Perhaps somebody can help me?