Derivative of a trace of a Hadamard product

842 Views Asked by At

Let $A$ be a $N\times5$ matrix, $\vec{b}$ be an $N \times 1$ vector and $\vec{x}$ be a $5\times1$ vector. I am looking for the derivative of the function,

$$f(\vec{x}) = \text{Trace}((A\vec{x}\vec{x}^TA^T - \vec{b}\vec{b}^T)\circ(A\vec{x}\vec{x}^TA^T - \vec{b}\vec{b}^T))$$

where $\circ$ denotes the Hadamard product.

I have checked some of the answered questions in the forum and try to derive for the solution myself but I am not sure if I derive the term correctly. What I get in the end has a sensible dimension. Can you check whether the term is correct? Or I miss anything?

$$\frac{\partial f}{\partial\vec{x}} = 4(A\vec{x}\vec{x}^TA^T - \vec{b}\vec{b}^T)A\vec{x}$$

Thank you.

2

There are 2 best solutions below

3
On BEST ANSWER

Your answer is very close -- just diagonalize the term in parentheses and pre-multiply by $A^T$.
Here are the details.

For typing convenience, define the symmetric matrix $$\eqalign{ M &= Axx^TA^T-bb^T \\ }$$ Write the function in terms of this variable, then calculate its differential and gradient. $$\eqalign{ f &= (I\odot I):(M\odot M) \\ &= (I\odot M):(I\odot M) \\ df &= 2(I\odot M):(I\odot dM) \\ &= 2(I\odot M):dM \\ &= 2(I\odot M):A\,d(xx^T)\,A^T \\ &= 2A^T(I\odot M)A:d(xx^T) \\ &= 2A^T(I\odot M)A:2\operatorname{Sym}(dx\,x^T) \\ &= 4\operatorname{Sym}\big(A^T(I\odot M)A\big):(dx\,x^T) \\ &= 4\,A^T(I\odot M)Ax:dx \\ \frac{\partial f}{\partial x} &= 4\,A^T(I\odot M)Ax \\ }$$ $$\eqalign{ }$$ where $I$ is the identity matrix, $\odot$ is denotes the elementwise/Hadamard product, a colon denotes the trace/Frobenius product, i.e. $$A:B = \operatorname{Tr}(A^TB)$$ and Sym() is the symmetrization operator $$\operatorname{Sym}(A) = \frac{A+A^T}{2}$$ At the risk of introducing yet another function, one might also write $$I\odot M = \operatorname{Diag}(M)$$

0
On

Another way to approach the problem is to use diagonal matrices.

Define some variables. $$\eqalign{ y &= Ax,\quad Y=\operatorname{Diag}(y),\quad B=\operatorname{Diag}(b) \\ y &= Y{\tt1},\quad b = B{\tt1}=\operatorname{diag}(B) \\ M &= yy^T-bb^T,\quad\operatorname{Diag}(M)=Y^2-B^2 \\ }$$ Then borrow Greg's form of the objective function but with diagonal matrices instead of Hadamard products. This is especially nice because diagonal matrices are a lot like scalars, i.e. they are symmetric and commute with each other. $$\eqalign{ f &= (I\odot M):(I\odot M) \\ &= (Y^2-B^2):(Y^2-B^2) \\ df&= 2(Y^2-B^2):d(Y^2-B^2) \\ &= 2(Y^2-B^2):2Y\,dY \\ &= 4(Y^2-B^2)Y:dY \\ &= 4(Y^2-B^2)y:dy \\ &= 4(Y^2-B^2)y:A\,dx \\ &= 4A^T(Y^2-B^2)y:dx \\ g=\frac{\partial f}{\partial x} &= 4A^T(Y^2-B^2)y \\ dg&= 4A^Td(Y^2-B^2)y + 4A^T(Y^2-B^2)dy \\ &= 4A^T(2Y\,dY)y + 4A^T(Y^2-B^2)dy \\ &= 4A^T(2Y^2)dy + 4A^T(Y^2-B^2)dy \\ &= 4A^T(3Y^2-B^2)A\,dx \\ H=\frac{\partial g}{\partial x} &= 4A^T(3Y^2-B^2)A \\ }$$ The nice thing about writing the gradient and Hessian in terms of $y$ is that they are unchanged if a constant vector is added to $y$. $$\eqalign{ y &= Ax+c \\ g &= 4A^T(Y^2-B^2)y \\ H &= 4A^T(3Y^2-B^2)A \\ }$$