Logic inside combining algebraic operations into vector-matrix form

42 Views Asked by At

The derivative of a matrix-valued function with respect to a matrix

Chain rule and vector-matrix calculus

Fundamental object with which we will work:

$x=[a,b,c]^T$ - $3 \times 1$ vector

Now let's keep in mind an important properties:

$\frac{d}{dx}(x \cdot x^T)=x \otimes I + I \otimes x$

$(A \otimes B)v=vec(BVA^T)$

Let's take expression:

$q_1=x \cdot x^T \cdot x$

Let's find the derivatives with respect to the components of the vector $x$ separately. We get a series of terms:

$\frac{d}{da}q_1=\frac{d}{da}x \cdot x^T \cdot x + x \cdot \frac{d}{da}x^T \cdot x + x \cdot x^T \cdot \frac{d}{da}x$

$\frac{d}{db}q_1=\frac{d}{db}x \cdot x^T \cdot x + x \cdot \frac{d}{db}x^T \cdot x + x \cdot x^T \cdot \frac{d}{db}x$

$\frac{d}{dc}q_1=\frac{d}{dc}x \cdot x^T \cdot x + x \cdot \frac{d}{dc}x^T \cdot x + x \cdot x^T \cdot \frac{d}{dc}x$

Now let's try to combine these expressions into one vector-matrix form:

$\begin{bmatrix} \frac{d}{da}x \cdot x^T \cdot x \\ \frac{d}{db}x \cdot x^T \cdot x \\ \frac{d}{dc}x \cdot x^T \cdot x \end{bmatrix} \rightarrow x^T \cdot x \cdot I$

$\begin{bmatrix} x \cdot \frac{d}{da}x^T \cdot x \\ x \cdot \frac{d}{db}x^T \cdot x \\ x \cdot \frac{d}{dc}x^T \cdot x \end{bmatrix} \rightarrow x \cdot x^T$

$\begin{bmatrix} x \cdot x^T \cdot \frac{d}{da}x \\ x \cdot x^T \cdot \frac{d}{db}x \\ x \cdot x^T \cdot \frac{d}{dc}x \end{bmatrix} \rightarrow x \cdot x^T$

Thus, the resulting expression will look like:

$\frac{d}{dx}q_1=x^T \cdot x \cdot I + 2 \cdot x \cdot x^T$

My question is this: as long as we work with the individual components of the vector, the dot product in the expression is preserved + we use the chain rule. Is there any logic/rule/algorithm for combining such simple terms into one vector matrix operation? When does the transition from the scalar product to the Kronecker product, etc., occur?

1

There are 1 best solutions below

1
On BEST ANSWER

$ \def\d{\delta}\def\o{{\tt1}}\def\p{\partial} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vc#1{\op{vec}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $The derivative of a vector $x$ with respect to its $k^{th}$ component is the $k^{th}$ cartesian basis vector, i.e. $$\eqalign{ \grad{x}{x_k} &= e_k \\ }$$ Taking the $i^{th}$ component of this equation produces the result in index notation $$\eqalign{ e_i^T\gradLR{x}{x_k} &= e_i^Te_k \qiq \grad{x_i}{x_k} = \d_{ik} \qquad\qquad \\ }$$ Applying these results to the matrix $\,xx^T$ yields $$\eqalign{ \grad{\,(xx^T)}{x_k} &= xe_k^T + e_kx^T \qiq \grad{\,(x_ix_j)}{x_k} &= x_i\d_{jk} + \d_{ik}x_j \\ }$$ So the gradient of the matrix $xx^T$ with respect to the vector $x$ is a third-order tensor (which is why it carries 3 free indexes) which cannot be rendered in matrix notation.

So your first "important property" is simply not true. The result that you've misquoted is actually $$\eqalign{ \grad{\vc{xx^T}}{x} &= {x\otimes I + I\otimes x} \\ }$$ You may not think the presence of that $\vc{}$ operator is important, but it is.