Compute derivative with respect to a matrix

354 Views Asked by At

I have following vectors and matrices:

$ F = $ {$f_1, f_2, ... f_n$} ; where $f_i$ are scalars, is a column vector of size $n$

$ S = $ {$s_1, s_2, ... s_m$} ; where $s_j$ are scalars, is a row vector of size $m$

$ C = $ {$c_{ij}$} ; is a $n \times m$ matrix

Here $F$ and $S$ are fixed, and $C$ is variable.

Edit: The constraints on C are:

  1. $0 <= c_{ij} <= 1$
  2. $\sum_{i=1}^n c_{ij} <= 1 $
  3. $c_{ij}=0$ for some particular $i,$ and $j$ based on research data

I am predicting $f_i$ as:

$\hat{f_i} = \sum_{j=1}^m c_{ij}s_j $

I want to minimize $E = \sum_{i=1}^n (f_i - \hat{f_i})^2 $

I think in matrix notation $E = (F-CS^T)^T.(F-CS^T)$ (correct me if I am wrong)

Most of the optimization algorithms requires gradient, so I believe I need differentiate it with respect to C (again, please correct me if I am wrong). I do not know how to do it using matrix algebra notation, I did it by converting into summations and here is what I got.

$E = \sum_{i=1}^n (f_i - \sum_{j=1}^m c_{ij}s_j)^2 $

$\displaystyle \frac {\partial E} {\partial c_{ij}}$$ = -2 (f_i - \sum_{j=1}^m c_{ij} s_j).s_j $

$ = -2[F -CS^T]^T.S^T$

I want to know if we get the same results if get use the matrix notations to differentiate E, of if I have done some mistake somewhere.

1

There are 1 best solutions below

2
On BEST ANSWER

Let $$M=CS^T-F$$then use the Frobenius (:) Inner Product to rewrite the error function. Then finding the differential and gradient is straightforward $$\eqalign{ E &= M:M \cr\cr dE &= 2M:dM \cr &= 2M:dC\,S^T \cr &= 2MS:dC \cr\cr \frac{\partial E}{\partial C} &= 2MS = 2(CS^T-F)S \cr\cr }$$ This derivation holds when $(S,F)$ are full matrices, as well as when they are merely vectors.

The differential rule for any product (matrix, Kronecker, Hadamard, Frobenius) is very simple $$d(A\star B) = dA\star B + A\star dB$$In general you must respect to order of the matrices, however, since the Frobenius and Hadamard products are commutative you can rearrange and collect terms, much like working with scalars.

Don't try to apply this simple rule to gradients, it only works for differentials.

The Frobenius product is equivalent to the trace $A:B={\rm tr}(A^TB)$. Using the cyclic property of the trace, you can prove the following are all equal $$\eqalign{ AB:C \cr &= A:CB^T \cr &= B:A^TC \cr &= (AB)^T:C^T \cr &= C:AB \cr\cr }$$ Each term in a Frobenius (or Hadamard) product must be the same "shape". Since a matrix and its transpose are always dimensionally compatible, $BB^T$ and $B^TB$ exist and are square-shaped.

These two facts help me remember the rule for moving $B$ to the RHS in the product $AB:C$

First, multiply by $B^T$ to ensure that both sides have the same shape $\,\,ABB^T:CB^T$

Then drop the $BB^T$ term to obtain $\,\,A:CB^T$