How telescoping sum is applied in this case?

197 Views Asked by At

I am reading the book Foundation of Machine Learning, and the author has many proofs for different theorems. Here is one part of a proof about Perceptron algorithm which I don't quite understand

Note: $w_t$ is a vector

Let \begin{equation} w_{t+1} = \begin{cases} w_t + ny_t x_t & \text{if } y_t(w.x_t) < 0 \\ w_t & \text{if } y_t(w.x_t) > 0 \\ w_t & \text{otherwise} \end{cases} \end{equation}

and in the proof, he has \begin{equation} \|w_{T+1}\| = \sqrt{\sum_{t \in T} \|w_{t+1}\|^2 - \| w_t \|^2} \end{equation}

he mentioned that above equation uses (telescoping sum, $w_0 = 0$). I have no idea why lhs is equal to rhs using telescoping sum, can you help me explain this?

Thanks in advance!

2

There are 2 best solutions below

0
On BEST ANSWER

This is an algebraic identity that has very little to do with the specific function: for any $f\colon \mathbb Z_{\ge0} \to \mathbb R_{\ge0}$ be any function satisfying $f(0)=0$, we have $$ f(T+1) = \sqrt{\sum_{0\le t\le T} \big( f(t+1)^2-f(t)^2 \big)}. $$ To see this, it's equivalent to show that $$ f(T+1)^2 = \sum_{0\le t\le T} \big( f(t+1)^2-f(t)^2 \big). $$ And this is easy to show by induction, or indeed by recognizing that the right-hand side is a telescoping sum. Taking $T=3$ for example: $$ f(4)^2 = (f(1)^2-f(0)^2) + (f(2)^2-f(1)^2) + (f(3)^2-f(2)^2) + (f(4)^2-f(3)^2). $$

0
On

From my understanding of perceptron, in the training scenario, for every input, you change the weights in such a way to minimize the error (typically mean squared error). The delta weight vector is perpendicular to the current weight vector.

Hence $\|w_{t+1} \|^2 = \|w_{t} \|^2 + \|ny_tx_t \|^2 (\text{ if } y_t(wx_t)<0)$.

Writing it in the following way helps you see this more clearly:

\begin{align} \|w_{T+1} \|^2 &= \|w_{T} \|^2 + \|ny_Tx_T \|^2 (\text{ if } y_T(wx_T)<0) \\ &= \|w_{T-1} \|^2 + \|ny_{T-1}x_{T-1} \|^2 (\text{ if } y_{T-1}(wx_{T-1})<0) + \|ny_Tx_T \|^2 (\text{ if } y_T(wx_T)<0) \\ &\ldots \end{align}

Since $w_0$ is zero. This simplifies to the desired expression .