Compute derivative with respect to a matrix

Question

Compute derivative with respect to a matrix

354 Views Asked by Bumbble Comm At 04 Apr 2026 - 12:32

I have following vectors and matrices:

$ F = $ {$f_1, f_2, ... f_n$} ; where $f_i$ are scalars, is a column vector of size $n$

$ S = $ {$s_1, s_2, ... s_m$} ; where $s_j$ are scalars, is a row vector of size $m$

$ C = $ {$c_{ij}$} ; is a $n \times m$ matrix

Here $F$ and $S$ are fixed, and $C$ is variable.

Edit: The constraints on C are:

$0 <= c_{ij} <= 1$
$\sum_{i=1}^n c_{ij} <= 1 $
$c_{ij}=0$ for some particular $i,$ and $j$ based on research data

I am predicting $f_i$ as:

$\hat{f_i} = \sum_{j=1}^m c_{ij}s_j $

I want to minimize $E = \sum_{i=1}^n (f_i - \hat{f_i})^2 $

I think in matrix notation $E = (F-CS^T)^T.(F-CS^T)$ (correct me if I am wrong)

Most of the optimization algorithms requires gradient, so I believe I need differentiate it with respect to C (again, please correct me if I am wrong). I do not know how to do it using matrix algebra notation, I did it by converting into summations and here is what I got.

$E = \sum_{i=1}^n (f_i - \sum_{j=1}^m c_{ij}s_j)^2 $

$\displaystyle \frac {\partial E} {\partial c_{ij}}$$ = -2 (f_i - \sum_{j=1}^m c_{ij} s_j).s_j $

$ = -2[F -CS^T]^T.S^T$

I want to know if we get the same results if get use the matrix notations to differentiate E, of if I have done some mistake somewhere.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Let $$M=CS^T-F$$then use the Frobenius (:) Inner Product to rewrite the error function. Then finding the differential and gradient is straightforward $$\eqalign{ E &= M:M \cr\cr dE &= 2M:dM \cr &= 2M:dC\,S^T \cr &= 2MS:dC \cr\cr \frac{\partial E}{\partial C} &= 2MS = 2(CS^T-F)S \cr\cr }$$ This derivation holds when $(S,F)$ are full matrices, as well as when they are merely vectors.

The differential rule for any product (matrix, Kronecker, Hadamard, Frobenius) is very simple $$d(A\star B) = dA\star B + A\star dB$$In general you must respect to order of the matrices, however, since the Frobenius and Hadamard products are commutative you can rearrange and collect terms, much like working with scalars.

Don't try to apply this simple rule to gradients, it only works for differentials.

The Frobenius product is equivalent to the trace $A:B={\rm tr}(A^TB)$. Using the cyclic property of the trace, you can prove the following are all equal $$\eqalign{ AB:C \cr &= A:CB^T \cr &= B:A^TC \cr &= (AB)^T:C^T \cr &= C:AB \cr\cr }$$ Each term in a Frobenius (or Hadamard) product must be the same "shape". Since a matrix and its transpose are always dimensionally compatible, $BB^T$ and $B^TB$ exist and are square-shaped.

These two facts help me remember the rule for moving $B$ to the RHS in the product $AB:C$

First, multiply by $B^T$ to ensure that both sides have the same shape $\,\,ABB^T:CB^T$

Then drop the $BB^T$ term to obtain $\,\,A:CB^T$

Compute derivative with respect to a matrix

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in PARTIAL-DERIVATIVE

Related Questions in MATRIX-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions