A matrix calculus problem with vectorization

113 Views Asked by At

I'm trying to find out the derivative of $vec(AA^T)$ w.r.t to $vec(A)$, where $A$ is a $m$ by $n$ matrix
I use a simple $3$ by $2$ case and find it entry-wisely.
I guess the answer would be $A^T\otimes I_m+[(I_m\otimes a_1),\ (I_m\otimes a_2),\ \ldots,\ (I_m\otimes a_n)]^T$ where $a_i$'s is columns of $A$ (not sure it is correct)
Is there any systematical way to find out such an expression?
My intuitive thought is to express $vec(AA^T)$ as some differentiable function of $vec(A)$ and utilize some simple matrix calculus rule.
Any tips would be appreciated!

2

There are 2 best solutions below

4
On BEST ANSWER

Define a vector function, then calculate its differential and gradient. $$\eqalign{ f &= {\rm vec}(AA^T) \\ df &= {\rm vec}\Big(I_m\,dA\,A^T+A\,dA^TI_m\Big) \\ &= (A\otimes I_m)\,da + (I_m\otimes A)K\,da \\ &= \Big((A\otimes I_m) + (I_m\otimes A)K\Big)\,da \\ \frac{\partial f}{\partial a} &= (A\otimes I_m) + (I_m\otimes A)K \\ }$$ where $K$ is the Commutation Matrix associated with vectorizing a matrix transpose via the Kronecker product.

For a given $(m,n)$ one can readily calculate $K$, e.g. in Julia

i = collect(1:m*n)
j = vec(reshape(i,m,n)')
K = sparse(i,j,1)
1
On

I think that a common way to do this is to use index notation $A_{ij}$ where $i$ goes over rows and $j$ goes over columns of the matrix. Then

$$F(A)_{ij} = \sum_k A_{ik} A_{jk}$$

Notice that the transpose is just swapping the indices in the second $A_{jk}$. Now we can do the derivative. The trick is to use completely different dummy coefficients for the derivative matrix

$$\frac{\partial F_{ij}}{\partial A_{mn}} = \frac{\partial}{\partial A_{mn}} \sum_k A_{ik} A_{jk}$$

Now we will make use of the fact that all coefficients of the matrix are independent. If this is the case, then

$$\frac{\partial A_{ik}}{\partial A_{mn}} = \delta_{im} \delta_{kn}$$

Where $\delta_{ij}$ is the Kronecker delta function, namely, it is equal to 1 if the two coefficients are equal, and 0 otherwise. Thus, using distribution rule for differentiation, we get

$$\frac{\partial F_{ij}}{\partial A_{mn}} = \sum_k(\delta_{im} \delta_{kn} A_{jk} + A_{ik} \delta_{jm} \delta_{kn}) = A_{jn} \delta_{im} + A_{in} \delta_{jm} $$

There are tons of matrix calculus rules, but instead of remembering all of them sometimes it is easier to just derive them from scratch using index notation