Partial derivatives of matrices?

319 Views Asked by At

Assume I have the following equation:

$$\textbf{f}(\textbf{x})=\frac{1}{2}\textbf{x}^T(\textbf{A}^T\textbf{A})\textbf{x} - \textbf{c}^T\log(\textbf{B}\textbf{x})$$

where $\textbf{x}$ is a $m \times 1$ vector of parameters, abd $\textbf{A}$ and $\textbf{B}$ are known $n \times m$ matrices, and $\textbf{c}$ is a known $n \times 1$ vector. I wish to find $\nabla \textbf{f}(\textbf{x})$, where

$$\nabla=\left[\frac{\partial}{\partial x_1},\frac{\partial}{\partial x_2},...,\frac{\partial}{\partial x_m}\right]^T$$

hence:

$$\nabla\textbf{f}(\textbf{x})=\frac{1}{2}\nabla\textbf{x}^T(\textbf{A}^T\textbf{A})\textbf{x} - \nabla\textbf{c}^T\log(\textbf{B}\textbf{x})$$

Is it possible to solve this system analytically? I imagine the first term could be simplified, as I believe a simple dot product with $\textbf{x}^T$ would yield the identity matrix ($\nabla\textbf{x}^T=\mathbb{I}$). How would I solve the second term, though? The logarithm in the second term is element-wise.

I would be grateful for any pointers, and ideally references to resources on how to solve such problems in general. ${}$

1

There are 1 best solutions below

0
On BEST ANSWER

For convenience, define the variables $$z = Ax,\quad y = Bx,\quad Y = {\rm Diag}(y)$$ Write the function in terms of these new variables, then calculate the differential and gradient. $$\eqalign{ f &= \tfrac 12z^Tz - c^T\log(y) \\ df &= z^Tdz - c^T(Y^{-1}dy) \\ &= z^TA\,dx - c^TY^{-1}B\,dx \\ &= (A^Tz - B^TY^{-1}c)^T\,dx \\ \frac{\partial f}{\partial x} &= A^Tz - B^TY^{-1}c \\ &= A^TAx - B^T{\rm Diag}(Bx)^{-1}c \\ }$$ In general, if you can manipulate your differential into the form $$df = g^Tdx$$ then the gradient can be identified as $\;g=\left(\frac{\partial f}{\partial x}\right)$

Let $f'$ denote the ordinary derivative of the function $f$. When $y$ is a vector, the function is evaluated element-wise, and the vector differential can be written as $$\eqalign{ df &= f'(y)\odot dy \\&= {\rm Diag}\Big(f'(y)\Big)\,dy \\ }$$ where $\odot$ denotes the elementwise/Hadamard product.