If I have the following equation,
$$E(w)=\sum_{i=1}^n (y_n -\beta^T x_n) +\lambda \sum_{i=1}^d \beta_i^2 $$
which is the cost function of regularized linear regression ($\beta$ and $x_n$ are vectors, "$n \times d$" is the dimension of the feature matrix),
To find optimal $\beta$, I can take a derivative of the above function w.r.t. $\beta$
and represent it as (further information: at 7:05):
$$\left(Y- \beta^TX\right)X = \lambda\left[\begin{matrix} 0 & & & & \\ & 1 & & & \\ & & 1 & & \\ & & & \ddots & \\ & & & & 1 \end{matrix}\right]\beta$$
However, If I have
$$E(w)=\sum_{i=1}^n ( y_{ n }-\beta^T x_n) +\lambda \sum_{i=1}^d \beta_i^2 + \mu \sum_{i=1}^{d-1} (\beta_i -\beta_{i+1})^2 $$
How do I represent this in a neat matrix form? I'm struggling to write it because of the presence of the extra term $(\beta_i-\beta_{i+1})^2$
For the last term, you can use a matrix $M$ (dimension $d\times d$) constructed as follows:
Then $$ \sum_{i=1}^{d-1}(\beta_i-\beta_{i+1})^2=|M\beta|^2=\beta'M'M\beta. $$ Differentiating $\beta'M'M\beta$ w.r.t. $\beta$ gives $2M'M\beta$.