Matrix differentiation with respect to the middle matrix

218 Views Asked by At

I have $\alpha = w'B'x$, where $\alpha$ is a scalar, and $w$, $B$, and $x$ are vectors or matrices with the following dimensions (the prime denotes transpose):

  • $w_{(k-1) * 1}$
  • $B_{p * (k-1)}$
  • $x_{p*1}$

How do I take the derivative with respect to $B$? That is, how do I simplify $\dfrac{\partial \alpha}{\partial B}$?

Context: I am trying to solve a minimization problem with respect to $B$, so I need to take the first derivative. The minimization problem is $argmin_B \dfrac{(R-w'B'x)^2}{f(x)}$.

3

There are 3 best solutions below

5
On

$\alpha$ is a linear map with respect to $B$. Therefore, its derivative at any point is $\alpha$ itself.

So here

$$\frac{\partial \alpha}{\partial B}\cdot H= w^\prime H^\prime x$$ where I imagine you use the prime to denote the transpose.

0
On

Generally speaking, if you want to take the derivative of a tensor $T$ of rank $k$ w.r.t. another tensor $X$ of rank $l$, the derivative can be encoded in a rank $k+l$ tensor

$$ \frac{{\rm d} T}{{\rm d} X} = \Big(\frac{{\rm d} T_{\mathbf i}}{{\rm d} X_{\mathbf j}}\Big)_{\mathbf i, \mathbf j} $$

Where $\mathbf i, \mathbf j$ are appropriate multi-indices. (Ignoring upper/lower index distinction for simplicity)

Now, given vectors $u\in\mathbb R^m$, $v\in\mathbb R^n$ consider the function $$f\colon \mathbb R^{m\times n}\to\mathbb R, A\mapsto u^T \cdot A \cdot v = \sum_{kl} A_{kl} u_k v_l$$ Then, by the definition above $f'(A)$ can be expressed as

$$ \frac{{\rm d} u^T A v}{{\rm d} A} = \Big( \frac{\sum_{kl} A_{kl} u_k v_l}{{\rm d}A_{ij}} \Big)_{ij} = ( u_i v_j)_{ij} = uv^T $$

Which encodes a linear function $\mathbb R^{m\times n}\to\mathbb R$ via $X\mapsto \langle f'(A), X\rangle = \langle uv^T, X\rangle = u^T X v$, where $\langle \cdot, \cdot\rangle$ is the natural (Frobenius) inner product on $\mathbb R^{m\times n}$. This is of course just an instance of Riesz' representation theorem, which tells us that any linear function $\mathbb R^{m\times n}\to\mathbb R$ can be expressed as $X\mapsto \langle V, X\rangle$ for some matrix $V$.

0
On

$ \def\a{\alpha}\def\b{\beta}\def\p{\partial} \def\L{\left}\def\R{\right}\def\LR#1{\L(#1\R)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\grad#1#2{\frac{\p #1}{\p #2}} $Use a colon to denote the matrix inner product, which is a convenient notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ Treating vectors as rectangular matrices $(n=1)$ this corresponds to the usual dot product $$\eqalign{ a:b \;=\; a\cdot b \;=\; a^Tb \\ }$$ Write the $\a$ function using this notation, then calculate its differential and gradient $$\eqalign{ \a &= wx^T:B \\ d\a &= wx^T:dB \\ \grad{\a}{B} &= wx^T \\\\ }$$ Note that the properties of the underlying trace function allow the terms in the matrix inner product to be rearranged in several different ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:AB &= A^TC:B = CB^T:A \\ }$$