Taking the derivative of a quadratic form

1.2k Views Asked by At

Consider the following expression:

$$l(a) = \|y\|_2^2 -2y^TXa + a^TX^TX a$$

where $X$ is some matrix. The derivative is

$$\nabla l(a) = 2X^TXa -2X^T y$$

I'm new to matrix calculus. Could you please explain how to get the derivative?

3

There are 3 best solutions below

0
On BEST ANSWER

It is straightforward to prove this using index notation for each component, but it sounds like you want a more intuitive explanation.

The first term is constant so its differential is zero. The second term is a linear function of $a$, so its differential is given by the same matrix: $2y^TX$.

The last term is equal to $Xa\cdot Xa=||Xa||^2$, where the dot represents the dot product. This is the composition of two functions, first matrix multiplication by $X$, then the function $v\to||v||^2=\sum v_i^2$; the first of these of course has differential $X$, and the second has differential $(2v_1,\dots,2v_n)=2v^T$. Using the chain rule this becomes $2(Xa)^TX=2a^TX^TX$.

Putting this together we have

$\nabla l(a) = 2a^TX^TX -2y^TX$.

Taking the transpose of this yields your expression.

1
On

Some hints

  1. I assume that vectors have $N$ entries, and matrices are $N \times N$. Then, you are taking the derivative with respect to $N$ variables. This means that you must obtain a vector of $N$ derivatives as a results. Each entry of this vector is the derivative with respect to one entry of the vector $a$.
  2. When dealing with vectors, matrices and the standard vector-vector/vector-matrix/matrix-matrix product, then you obtain expressions where only additions and multiplications appear.
  3. Consider the case $v^\top a$, where $v$ is a vector (notice that you can pose $v = 2Xy$ in the term $-2y^\top X a$ in your example). It is easy to see that the derivative with respect to $a_i$ is simply $v_i$. Then, which is the vector representing the derivative of this term?
  4. Consider now the more complex case $v(a)^\top v(a)$ (notice that you can pose $v(a) = Xa$ in the term $a^\top X^\top X a$ in your example). Using the rule of product, you have that the derivative with respect to $a_i$ is: $$v_i(a) + v_i(a) = 2v_i(a).$$ Then, which is the vector representing the derivative of this term?
0
On

Note that by differential rules for matrix

$$\frac{\partial u^TAv}{\partial x}=\frac{\partial u}{\partial x}Av+\frac{\partial v}{\partial x}A^Tu$$

thus

$$\frac{\partial(y^TXa)}{\partial a}=X^Ty$$

$$\frac{\partial(a^TX^TXa)}{\partial a}=X^TXa+X^TXa=2X^TXa$$