What would be the partial derivative w.r.t to $a\in \mathbb{R}^{n\times 1}$ of $(Xa)^TXa$?

49 Views Asked by At

What would be the partial derivative w.r.t to $a\in \mathbb{R}^{n\times 1}$ of $$(Xa)^TXa$$

where $X\in \mathbb{R}^{n\times n}$?

My attempt:

We know that $$\frac{\partial }{\partial a}(Xa)^TXa=\partial (Xa)^T(Xa)+(Xa)^T\partial (Xa)$$

We also know that $$\partial X^t=(\partial X)^T$$

So taking these facts into account:

$$\frac{\partial }{\partial a}(Xa)^TXa=X^TXa+a^TX^TX$$

Would this be correct?

I think it's equal to $2X^TXa$ but i'm not sure how to derive it.

1

There are 1 best solutions below

4
On BEST ANSWER

First, let's understand exactly what you are trying to do. You have a function $f \colon \mathbb{R}^{n \times 1} \rightarrow \mathbb{R}$ given by $$ f(a) = \left( Xa \right)^T \cdot \left( Xa \right) = \left< Xa, Xa \right> $$ where $$ a = \begin{bmatrix} a_1 \\ \vdots \\ a_n \end{bmatrix}. $$ Such a function doesn't have "a partial derivative with respect to $a$" but it has a differential, or a gradient, or has partial derivatives with respect to each of the variables $a_1,\dots,a_n$. I'll assume we are interested in computing the gradient of $f$. Note that your pruposed formula $$ X^T X a + a^T X^T X $$ doesn't even compile as $X^T X a$ is a column vector while $a^T X^T X$ is a row vector so their addition is not defined. The gradient of $f$ is given by $$ \nabla f = \begin{bmatrix} \frac{\partial f}{\partial a_1} \\ \vdots \\ \frac{\partial f}{\partial a_n} \end{bmatrix}. $$ We have $$ \frac{\partial f}{\partial a_i} = \left< \frac{\partial}{\partial a_i} (Xa), Xa \right> + \left< Xa, \frac{\partial}{\partial a_i} (Xa) \right> = 2 \left< \frac{\partial}{\partial a_i} (Xa), Xa \right> = 2 \left< X \left( \frac{\partial}{\partial a_i} a \right), Xa \right> = 2 \left< Xe_i, Xa \right> = 2 e_i^T X^T X a = \left< e_i, 2X^TXa \right> $$ which implies that $\nabla f = 2 X^T X a$ as required.