I'm new to matrix calculus, and I've never really taken derivatives of summations before. Could someone show me how I would get the first order derivative of this?
$J(w)=\frac{1}{2}[\sum_{i=1}^{m}(w^Tx^{(i)}-y^{(i)})^2]+\lambda||w||_2^2$
Thanks!
I'm new to matrix calculus, and I've never really taken derivatives of summations before. Could someone show me how I would get the first order derivative of this?
$J(w)=\frac{1}{2}[\sum_{i=1}^{m}(w^Tx^{(i)}-y^{(i)})^2]+\lambda||w||_2^2$
Thanks!
Copyright © 2021 JogjaFile Inc.
Our cost function is given by
$$J(\boldsymbol{w})=\dfrac{1}{2}\sum_{n=1}^{N}\left[\boldsymbol{w}^T\boldsymbol{x}_n-{y}_n \right]^2+\lambda\boldsymbol{w}^T\boldsymbol{w}$$ $$=\dfrac{1}{2}\sum_{n=1}^{N}\left[\boldsymbol{w}^T\boldsymbol{x}_n-{y}_n \right]^T\left[\boldsymbol{w}^T\boldsymbol{x}_n-{y}_n \right]+\lambda\boldsymbol{w}^T\boldsymbol{w}$$ $$=\dfrac{1}{2}\sum_{n=1}^{N}\left[\boldsymbol{x}^T_n\boldsymbol{w}^{}\boldsymbol{w}^T\boldsymbol{x}_n-y_n\boldsymbol{w}^T\boldsymbol{x}_n-\boldsymbol{x}^T_n\boldsymbol{w}y_n+y^2_n\right]+\lambda\boldsymbol{w}^T\boldsymbol{w}$$
Now, think of $\boldsymbol{w}$ as if it were a scalar and calculate the total derivative
$$dJ = \dfrac{1}{2}\sum_{n=1}^{N}\left[\boldsymbol{x}^T_nd\boldsymbol{w}^{}\boldsymbol{w}^T\boldsymbol{x}_n+\boldsymbol{x}^T_n\boldsymbol{w}^{}d\boldsymbol{w}^T\boldsymbol{x}_n-y_nd\boldsymbol{w}^T\boldsymbol{x}_n-\boldsymbol{x}^T_nd\boldsymbol{w}y_n\right]+\lambda d\boldsymbol{w}^T\boldsymbol{w} + \lambda \boldsymbol{w}^Td\boldsymbol{w}$$ $$= \dfrac{1}{2}\sum_{n=1}^{N}\left[\boldsymbol{x}^T_nd\boldsymbol{w}^{}\boldsymbol{w}^T\boldsymbol{x}_n+\boldsymbol{x}^T_n\boldsymbol{w}^{}d\boldsymbol{w}^T\boldsymbol{x}_n-2y_nd\boldsymbol{w}^T\boldsymbol{x}_n\right]+2\lambda d\boldsymbol{w}^T\boldsymbol{w}.$$ I used the product rule for the total derivative. Note, that the transpose of a scalar is scalar. I used this to combine the last terms. Now, we note that
$$\boldsymbol{x}^T_nd\boldsymbol{w}^{}\boldsymbol{w}^T\boldsymbol{x}_n=d\boldsymbol{w}^{T}\boldsymbol{x}_n\boldsymbol{x}^T_n\boldsymbol{w}^{}$$
and
$$\boldsymbol{x}^T_n\boldsymbol{w}^{}d\boldsymbol{w}^T\boldsymbol{x}_n=\boldsymbol{w}^T\boldsymbol{x}_n\boldsymbol{x}_n^Td\boldsymbol{w}^{}=d\boldsymbol{w}^{T}\boldsymbol{x}_n\boldsymbol{x}^T_n\boldsymbol{w}^{}$$ becaue both terms are scalars. Because of these observations we can rewrite the total derivate as
$$dJ = \dfrac{1}{2}\sum_{n=1}^{N}\left[2d\boldsymbol{w}^{T}\boldsymbol{x}_n\boldsymbol{x}^T_n\boldsymbol{w}^{}-2y_nd\boldsymbol{w}^T\boldsymbol{x}_n\right]+2\lambda d\boldsymbol{w}^T\boldsymbol{w}.$$
Factoring $d\boldsymbol{w}^T$ results in.
$$dJ =d\boldsymbol{w}^{T}\left[\dfrac{1}{2}\sum_{n=1}^{N}\left[2\boldsymbol{x}_n\boldsymbol{x}^T_n\boldsymbol{w}^{}-2y_n\boldsymbol{x}_n\right]+2\lambda \boldsymbol{w}\right].$$
The expression in the bracket on the right hand side is the Gradient of $J$ with respect to $\boldsymbol{w}$. Setting the gradient to zero and solving for $\boldsymbol{w}$ results in an estimate $\boldsymbol{\hat{w}}$ for $\boldsymbol{w}$
$$\boldsymbol{\hat{w}}=\left[\sum_{n=1}^{N}\boldsymbol{x}_n\boldsymbol{x}^T_n + 2\lambda\boldsymbol{I} \right]^{-1}\sum_{n=1}^{N}y_n\boldsymbol{x}_n.$$