Consider the following expression:
$$l(a) = \|y\|_2^2 -2y^TXa + a^TX^TX a$$
where $X$ is some matrix. The derivative is
$$\nabla l(a) = 2X^TXa -2X^T y$$
I'm new to matrix calculus. Could you please explain how to get the derivative?
Consider the following expression:
$$l(a) = \|y\|_2^2 -2y^TXa + a^TX^TX a$$
where $X$ is some matrix. The derivative is
$$\nabla l(a) = 2X^TXa -2X^T y$$
I'm new to matrix calculus. Could you please explain how to get the derivative?
On
Some hints
On
Note that by differential rules for matrix
$$\frac{\partial u^TAv}{\partial x}=\frac{\partial u}{\partial x}Av+\frac{\partial v}{\partial x}A^Tu$$
thus
$$\frac{\partial(y^TXa)}{\partial a}=X^Ty$$
$$\frac{\partial(a^TX^TXa)}{\partial a}=X^TXa+X^TXa=2X^TXa$$
It is straightforward to prove this using index notation for each component, but it sounds like you want a more intuitive explanation.
The first term is constant so its differential is zero. The second term is a linear function of $a$, so its differential is given by the same matrix: $2y^TX$.
The last term is equal to $Xa\cdot Xa=||Xa||^2$, where the dot represents the dot product. This is the composition of two functions, first matrix multiplication by $X$, then the function $v\to||v||^2=\sum v_i^2$; the first of these of course has differential $X$, and the second has differential $(2v_1,\dots,2v_n)=2v^T$. Using the chain rule this becomes $2(Xa)^TX=2a^TX^TX$.
Putting this together we have
$\nabla l(a) = 2a^TX^TX -2y^TX$.
Taking the transpose of this yields your expression.