Matrix Calculus: $\nabla_{\mathbf{c}}(-2\mathbf{x}^T \mathbf{D} \mathbf{c} + \mathbf{c}^T \mathbf{c}) = -2\mathbf{D}^T \mathbf{x} + 2\mathbf{c}$?

916 Views Asked by At

In an example of Principle Component Analysis, my textbook uses vector calculus to do the following:

$\nabla_{\mathbf{c}}(-2\mathbf{x}^T \mathbf{D} \mathbf{c} + \mathbf{c}^T \mathbf{c}) = \mathbf{0}$

$\rightarrow -2\mathbf{D}^T \mathbf{x} + 2\mathbf{c} = \mathbf{0}$

$\rightarrow c = \mathbf{D}^T \mathbf{x}$

Where $\nabla_{\mathbf{c}}$ is the gradient with respect to $\mathbf{c}$, $\mathbf{D} \in \mathbb{R}^{n \times l}$, $\mathbf{c} \in \mathbb{R}^l$, and the columns of $\mathbf{D}$ are orthogonal to each other.

I have the following questions:

  1. How did the authors get from $\mathbf{x}^T \mathbf{D} \mathbf{c}$ to $\mathbf{D}^T \mathbf{x}$? I haven't studied matrix calculus, but I'm assuming that $\dfrac{ \partial }{\partial{\mathbf{c}}} \mathbf{c} = I$? What about $\mathbf{x}^T \mathbf{D}$ to $\mathbf{D}^T \mathbf{x}$?

  2. How did the authors go from $\mathbf{c}^T \mathbf{c}$ to $2\mathbf{c}$? I found the following in Matrix Differentiation by Randal J. Barnes:


enter image description here


If we assume that $\mathbf{A} = I$, then isn't this what we're looking for? But wouldn't that leave us with $2\mathbf{c}^T$ instead of just $2\mathbf{c}$?

I would greatly appreciate it if people could please take the time to clarify this.

2

There are 2 best solutions below

5
On BEST ANSWER

There are (unfortunately) different conventions in matrix calculus. The most common is $$\nabla_{c} \equiv \frac{\partial}{\partial c}$$ However, in you example the convention is $$\nabla_{c} \equiv \frac{\partial}{\partial c^{T}}$$ With this in mind, we perform the calculation \begin{align*} \frac{\partial}{\partial c^{T}}\left(-2x^{T}Dc + c^{T}c\right) = 0 \tag{1}\\ \frac{\partial}{\partial c^{T}}\left(-2c^{T}D^{T}x + c^{T}c\right) = 0 \tag{2}\\ -2\frac{\partial c^{T}}{\partial c^{T}}D^{T}x + \frac{\partial}{\partial c^{T}}(c^{T}c) = 0 \tag{3}\\ -2D^{T}x + \frac{\partial c^{T}}{\partial c^{T}}c + c^{T}\frac{\partial c}{\partial c^{T}}= 0 \tag{4}\\ -2D^{T}x + c + c^{T}\frac{\partial c}{\partial c^{T}}= 0 \tag{5}\\ -2D^{T}x + 2c= 0 \tag{6} \end{align*}

Now, to get from from $(5)$ to $(6)$ we need to show that $\displaystyle c^{T}\frac{\partial c}{\partial c^{T}} = c$.

We can see this using index notation. Here, $c$ has an upper index, and $c^{T}$ has a lower index. They are related by $c^{i} = \delta^{ij}(c^{T})_{j}$ and $(c^{T})_{i} = \delta_{ij}c^{j}$ (where $\delta$ is the kronecker delta).

So we have $$\left(c^{T}\frac{\partial c}{\partial c^{T}}\right)^j = (c^{T})_{i}\frac{\partial c^{i}}{\partial (c^{T})_j}\\ = (c^{T})_{i}\frac{\partial (c^{T})_{k}\delta^{ki}}{\partial (c^{T})_j}\\ = (c^{T})_{i}\frac{\partial (c^{T})_{k}}{\partial (c^{T})_j}\delta^{ki}\\ = (c^{T})_{i}\delta^{j}_{k}\delta^{ki} = (c^{T})_{i}\delta^{ji} = c^{j} $$

And we are done.

8
On

One can define the gradient $\nabla f$ of a (nice) scalar function $f$ as the unique vector valued function such that $$ f(x+v) = f(x) + \nabla f(x)^Tv + o(|v|) \qquad (|v|\to 0)$$ From this, one can prove the following basic results-

  1. If $f(x)$ is a linear map given by matrix multiplication $f(x) = Ax$ then its gradient is constant in $x$, with value $\nabla f(x) =A^T$.
  2. If $f(x) = x^TAx$ then $\nabla f(x) = (A^T+A)x $.

I would say these are good exercises if you haven't worked with derivatives much. These together give the result.

A remark about the result you found in Barnes's book: he is using a different convention, $$f(x+v) = f(x) + \tilde\nabla f(x) v + o(|v|)$$ and these are of course related by transposition of matrices. But you should take note which convention you are using when looking at other resources.