I have some amateur questions about the perceptron convergence proof:
I don't understand where we got $k\gamma$ from, specifically how did we go from $\frac{\theta^{k-1}\theta^*}{\|\theta^*\|}+\gamma$ to $k\gamma$?
How did we go from $\|\theta^{k-1}+y^{(i)}x^{(i)}\|^2$ to $\|\theta^{k-1}\|^2+2y^{(i)}\theta^{(k-1)} x^{(i)}+\|x^{(i)}\|^2$?
Since $\| \theta^{k-1}+y^{(i)}x^{(i)} \|$ is just an expression of this vector's length, I would have thought that the $\| \|$ and the square would have cancelled each other out, thus leaving us with only $(\theta^{k-1})^2+(y^{(i)}x^{(i)})^2$?
(As far as I understand, the length of a vector $(x,y)$ is just $\sqrt{x^2+y^2}$)


We have $$\frac{\theta^{(1)}\cdot \theta^*}{\|\theta^*\|}\ge \frac{\theta^{(0)}\cdot \theta^*}{\|\theta^*\|} + \gamma = \frac{0\cdot \theta^*}{\|\theta^*\|} + \gamma=\gamma$$ and hence by induction hypothesis $$\frac{\theta^{(k)}\cdot \theta^*}{\|\theta^*\|}\ge \frac{\theta^{(k-1)}\cdot \theta^*}{\|\theta^*\|} + \gamma \ge (k-1)\gamma + \gamma=k\gamma$$