Let $Y = \begin{pmatrix} y_1 \\ \cdots \\ y_N\end{pmatrix}$ and $X = \begin{pmatrix} x_{11} & \cdots & x_{1D} \\ \cdots & \cdots & \cdots \\ x_{N1} & \cdots &x_{ND}\end{pmatrix}$. Let also $e = y - Xw$ and let's write the mean square error as $L(w) = \frac{1}{2N} \sum_{i=1}^{N} (y_n - x_n^Tw)^2 = \frac{1}{2N} e^T e$.
I want to prove that the gradient of $L(w)$ is $-\frac{1}{N} X^T e$. What would be a way of proving this?
Since
$$ L(w) = \frac{1}{2N}\sum_{n=1}^N(y_n - (Xw)_n)^2 $$
it follows that
$$ \frac{\partial L}{\partial w_j} = -\frac{1}{N}\sum_{n=1}^N x_{nj}(y_n - (Xw)_n) = -\frac{1}{N}x_j^Te, $$
where $x_j$ is the $j$th column of $X$. Therefore,
$$ \nabla L(w) = -\frac{1}{N}X^Te $$