Why does the summation dissapear when taking the derivative of the sum of squares?

2.5k Views Asked by At

Why is it that the derivative of the sum of squares of a vector, w: \begin{eqnarray} \frac{\lambda}{2n} \sum_w w^2, \end{eqnarray}

turns out to be

\begin{eqnarray} \frac{\lambda}{n} w \end{eqnarray}

and not

\begin{eqnarray} \frac{\lambda}{n} \sum_w w \;? \end{eqnarray}

Basically as I see it, we've got

\begin{eqnarray} w = [w_1, w_2, w_3 ...] \end{eqnarray}

\begin{eqnarray} \frac{d}{dw} \frac{\lambda}{2n} \sum_w w^2 = \frac{\lambda}{2n}(\frac{\partial}{\partial w_1} \sum_w w^2 + \frac{\partial}{\partial w_2} \sum_w w^2 + \frac{\partial}{\partial w_3} \sum_w w^2 ...) \end{eqnarray}

\begin{eqnarray} = \frac{\lambda}{n} (w_1 + w_2 + w_3 ...) \end{eqnarray}

\begin{eqnarray} = \frac{\lambda}{n} \sum_w w \end{eqnarray}

I'm following this ebook here (equations 87/88, which are basically the same as what I've written above). The main thing I don't understand is why we can eliminate the summation. Any math books or writeups on the subject would also be helpful.

2

There are 2 best solutions below

9
On BEST ANSWER

If there are actually $m$ input variables, you write sum in the Equation $87$ in the ebook in the notation $$ \sum_{i=1}^m w_i^2, $$ and it can be viewed as a function of the $m$ variables $w_1, \ldots, w_m.$ The "derivative" in the ebook is a partial derivative, which deals with how the function value would change if you could slightly increase or decrease just one of the $m$ input variables while leaving all the others unchanged. The notation $\frac{\partial}{\partial w}$ in the ebook means the same thing as you would recognize in the $\frac{\partial}{\partial w_i},$ that is, it is a partial derivative with respect to one variable, but the ebook has chosen to let the letter $w$ by itself represent one of the $m$ variables rather than use a subscript.

The partial derivative of the sum of two functions is the sum of the partial derivatives, just like you are used to in the case of single-variable functions, but only when both partial derivatives are with respect to the same variable. The partial derivatives of different variables do not add up in the manner you imagine; and in any case, the ebook definitely means to take the partial derivative of one variable over the entire sum.

When we write $$ \frac{\partial}{\partial w_j} w_i^2, $$ the result is zero unless $i = j,$ because in a partial derivative $\frac{\partial}{\partial w_j}$ over the variables $w_1, \ldots, w_m,$ all the variables except $w_j$ act like constants. On the other hand, $$ \frac{\partial}{\partial w_j} w_j^2 = 2w_j, $$ because that describes how the function $w_j^2$ changes as we vary $w_j.$

To spell it out in gory detail, what you actually have is \begin{align} \frac{\partial}{\partial w_j} \frac{\lambda}{2n} \sum_{i=1}^m w_i^2 &= \frac{\lambda}{2n} \frac{\partial}{\partial w_j}\left( w_1^2 + \cdots + w_{j-1}^2 + w_j^2 + w_{j+1}^2 + \cdots + w_m^2 \right) \\ & = \frac{\lambda}{2n} \left(\frac{\partial}{\partial w_j}w_1^2 + \cdots + \frac{\partial}{\partial w_j}w_{j-1}^2 + \frac{\partial}{\partial w_j}w_j^2 + \frac{\partial}{\partial w_j}w_{j+1}^2 + \cdots + \frac{\partial}{\partial w_j}w_m^2 \right) \\ & = \frac{\lambda}{2n} \left(0 + \cdots + 0 + 2w_j + 0 + \cdots + 0\right) \\ & = \frac{\lambda}{n} w_j. \end{align}

3
On

I suppose it is because it is a partial derivative for a particular weight. The summation index is just omitted. Basically, in the book you mentioned: $$C = C_0 + \frac {\lambda}{2n}\sum_{i}{\omega_i}^2$$ And we are interested in $$\frac{\partial C}{\partial \omega_i}$$