Gradient of element-wise logarithm: $f(x) = \sum\log(\mathbf{A}^{T}(g(x) - b))$

465 Views Asked by At

Lets say I have the following equation

$$ f(x) = \sum\log(\mathbf{A}^{T}(g(x) - b)) $$ Where $g:\mathbb{R}^N \to \mathbb{R}^N\quad$ $x,b\in\mathbb{R}^N$ and $A \in \mathbb{R}^{M\times N}$, and the logarithm is the element-wise logarithm. That is, each element of the sum that gives $f(x)$ is equal to the to the scalar product between rows of $\mathbf{A}^{T}$ and the vector $g(x) - b$.

The gradient $\nabla_x g(x)$ is known, and I would like to take the gradient $\nabla_x f(x)$. If I define the $i$-$th$ column of the matrix $A$ as $A^{(i)}$, and the $(i,j)$-$th$ element as $A^{(i,j)}$ I believe that this gradient should look something like this $$ \nabla_x f(x) = \left[\sum_i\frac{A^{(i,1)}\partial_{x_1}g(x)}{A^{(i)T}\cdot(g(x) - b)} ,\cdots, \sum_i\frac{A^{(i,n)}\partial_{x_n}g(x)}{A^{(i)T}\cdot(g(x) - b)}\right]^{T} $$ where I've used $\partial_{x_j} = \frac{\partial}{\partial x_j}$

Here are my questions

  1. Is this correct?
  2. How can I write this in some kind of convenient vector notation? Eventually I need to write this in code - assume I have $\nabla_x g(x)$ as a vector already - and I don't want to have to write it out as an iteration over the elements of the vector. That would really suck.
  3. Lets say I now want to take the gradient $\nabla_b\left(\nabla_x f(x)\right)$, is this just going to be a square matrix, where the elements are a simple application of the quotient rule on each of the terms above?
1

There are 1 best solutions below

3
On BEST ANSWER

$\def\c#1{\color{red}{#1}}\def\D{{\rm Diag}}\def\e{\varepsilon}\def\o{{\tt1}}\def\p{{\partial}}\def\grad#1#2{\frac{\p #1}{\p #2}}\def\hess#1#2#3{\frac{\p^2 #1}{\p #2\,\p #3^T}}$For typing convenience, let $\o$ denote the all-ones vector and define the variables $$\eqalign{ p &= A^T(g-b) \quad&\iff\quad P = \D(p) \\ q &= P^{-1}\o \quad&\iff\quad Q = \D(q),\quad PQ = I \\ H &= \nabla_x g \quad&\iff\quad dg = H\,dx \\ R &= P^{-1}Q \\ }$$ and use a colon to denote the trace/Frobenius product, i.e. $$\eqalign{ A:B &= \sum_{i=1}^m \sum_{j=1}^n A_{ij} B_{ij} \;=\; {\rm Tr}(AB^T) \\ A:A &= \big\|A\big\|_F^2 \\ }$$ NB: When $(A,B)$ are vectors, the Frobenius product corresponds to the ordinary dot product.

Write the function using the above notation, then calculate its differential and gradient wrt $x$. $$\eqalign{ f &= \o:\log(p) \\ df &= \o:P^{-1}dp \\ &= P^{-1}\o:dp \\ &= q:A^Tdg \\ &= Aq:H\,dx \\ &= H^TAq:dx \\ w\doteq\grad{f}{x} &= H^TAq \qquad \big({\rm new\,gradient}\big) \\ }$$ To calculate the $k^{th}$ component, multiply the gradient by the corresponding basis vector $$\eqalign{ \grad{f}{x_j} = \e_j^TH^TAq \\ }$$ Now calculate the gradient of this gradient wrt $b$ $$\eqalign{ q &= P^{-1}\o \\ dq &= \c{dP^{-1}}\o \\ &= \c{-P^{-1}dP\,P^{-1}}\o \\ &= -P^{-1}dP\,q \\ &= -P^{-1}Q\,dp \\ &= +R\;(A^Tdb) \\ \\ dw &= H^TA\,dq \\ &= H^TA \, (RA^Tdb) \\ \grad{w}{b} &= H^TARA^T \;=\; \hess{f}{b}{x} \\ }$$ Reversing the order of differentiation yields $$\eqalign{ \hess{f}{x}{b} &= \left(\hess{f}{b}{x}\right)^T =\; ARA^TH \\ \\ }$$


Note that the cyclic property of the trace permits the terms in a Frobenius product to be rearranged in a number of useful ways, e.g. $$\eqalign{ A:B &= B:A = B^T:A^T \\ CA:B &= C:BA^T = A:C^TB \\ }$$ The matrix on either side of the colon must have exactly the same dimensions, as is the case with a Hadmard product. Speaking of which, the Frobenius and Hadamard product commute with each other $$A:B\odot C = A\odot B:C \;=\; \sum_{i=1}^m \sum_{j=1}^n A_{ij} B_{ij} C_{ij}$$