- I have a well-known hypothesis in Machine Learning, which I have to prove it.
- I tried to prove it by induction, but not sure if it's correct like this:
The Hypothesis:
Suppose that all the observations in a training set are within a hyper-sphere of radius $R$
(i.e. $ \forall x_i \in S; \left \Vert x_i\right \Vert \geq R$).
Further, we initialise the weight vector to be the null vector (i.e. $w^{(0)} = 0$) as well as the learning rate $ \epsilon = 1$.
How to show that after $t$ updates, the norm of the current weight vector satisfies : \begin{equation} \label{equ:one} \left \Vert w^{(t)}\right \Vert^2 \leq t \times R^{2} \end{equation}
hint : You can consider $ \left \Vert w^{(t)}\right \Vert^2 $ as $ \left \Vert w^{(t)} - w^{(0)}\right \Vert^2 $
My Idea for the solution is:
Using Prove by Induction:
(a): Firstly we consider that the hypothesis is correct for $ t $, and we prove it for $t = 1 $, and $ (t+1) $.
(b): For $ t = 0, \quad \left \Vert w^{(0)}\right \Vert^{2} = 0 $. \quad so the condition is satisfied for $ t = 0 $, and from the update rule: $ w^{(t+1)} \leftarrow w^{(t)} + \epsilon \times y \times x$, we have: $\epsilon = 1, \quad \left \Vert x\right \Vert \leq R,\quad w^{(1)} = w^{(0)} + y \times x, \\ w^{(1)} = y \times x, \quad \left \Vert w^{(1)}\right \Vert = \left \Vert y \times x\right \Vert = \left \Vert y \right \Vert \cdot \left \Vert x\right \Vert, \quad \left \Vert y \right \Vert = 1, \quad \left \Vert x\right \Vert \leq R \\ \left \Vert w^{(1)}\right \Vert \leq R \Rightarrow \left \Vert w^{(1)}\right \Vert^{2} \leq R^{2} : R \geq 0 $.
and here also the condition is satisfied for $ t = 1 $.
- (c): For $ (t+1) $ the weights vector will be: $ w^{(t+1)} = w^{(t)} + y \times x $, from the update rule, so \ $ \left \Vert w^{(t+1)}\right \Vert = \left \Vert w^{(t)}+ y \times x \right \Vert$, hence from the definition of the Euclidean norm: $\left \Vert w^{(t+1)} \right \Vert^{2} = \left \Vert w^{(t)}\right \Vert^{2} + \left \Vert w^{(1)} \right \Vert^{2}$, from (a): $\left \Vert w^{(t)}\right \Vert^{2} \leq t \times R^{2} $, and from (b): $ \left \Vert w^{(1)}\right \Vert^{2} \leq R^{2} $,
Sum the tow terms to get the inequality: $\left \Vert w^{(t+1)} \right \Vert^{2} \leq (t+1) \times R^{2}$, and the hypothesis is proved.