Big matrix derivative

Question

Big matrix derivative

171 Views Asked by Bumbble Comm At 08 Apr 2026 - 6:37

Let $U,A \in \mathbb{R}^{m x n}$, $\mathbf{a}\in \mathbb{R}^p$, $\mathbf{b}\in \mathbb{R}^q$ and $c \in \mathbb{R}$.

Let also the two vectors of square matrices,

$K_a^i \in \mathbb{R}^{mxm}, i=1,2,...p$ and $K_b^j \in \mathbb{R}^{nxn}, j=1,2,...q$.

Given the equation

$J = \parallel U - (\sum_i a_iK_a^i)A(\sum_j b_jK_b^j)^T\parallel_F^2 + c\parallel \mathbf{a} \parallel_2$

where $\parallel \cdot \parallel_F$ stands for the Frobenius norm. Is the following derivation correct? (making $\frac{dJ}{da_i}=0$)

\begin{align*} \frac{dJ}{da_i} &= 2 \left[U-(\sum_i a_iK_a^i)A(\sum_j b_jK_b^j)^T\right]\left[-(K_a^i)A(\sum_j b_jK_b^j)^T\right] + 2ca_i = 0 \\ &= \left[U-(\sum_i a_iK_a^i)A(\sum_j b_jK_b^j)^T\right] + ca_i\left[-(K_a^i)A(\sum_j b_jK_b^j)^T\right]^+ = 0 \\ & -ca_i\left[-(K_a^i)A(\sum_j b_jK_b^j)^T\right]^+ - (\sum_i a_iK_a^i)A(\sum_j b_jK_b^j)^T = -U\\ & a_ic\left[(K_a^i)A(\sum_j b_jK_b^j)^T\right]^+ + a_iK_a^iA(\sum_j b_jK_b^j)^T + (\sum_{s \neq i} a_sK_a^s)A(\sum_j b_jK_b^j)^T = U\\ & a_i\left[c\left[(K_a^i)A(\sum_j b_jK_b^j)^T\right]^+ + K_a^iA(\sum_j b_jK_b^j)^T\right] = U - (\sum_{s \neq i} a_sK_a^s)A(\sum_j b_jK_b^j)^T \\ a_i &= \left[U - (\sum_{s \neq i} a_sK_a^s)A(\sum_j b_jK_b^j)^T\right]\left[c\left[(K_a^i)A(\sum_j b_jK_b^j)^T\right]^+ + K_a^iA(\sum_j b_jK_b^j)^T\right]^+ \\ \end{align*} where $M^+$ is the pseudoinverse of the matrix $M$.

In case of being correct, is it possible to simplify it more?

Thanks in advance

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

The derivative expression needs a Frobenius product instead of a regular matrix product in the first term on the RHS,
$$\eqalign{ \frac{dJ}{da_i} &= 2 \bigg[U-(\sum_n a_nK_a^n)A(\sum_j b_jK_b^j)\bigg] : \bigg[-(K_a^i)A(\sum_j b_jK_b^j)\bigg] + 2ca_i \cr }$$ in order to make all of the terms scalar-valued. Otherwise you are adding a matrix to a scalar, and that doesn't make sense.

Also, you should not use the free-index $i$ (on the LHS) as a summation-index (on the RHS), that will only lead to confusion. You'll notice that I've changed the summation-index to $n$.

The reason for the Frobenius product comes from the fact that the Frobenius norm of a matrix (and its differential) can be written in terms of the Frobenius product $$\eqalign{ \|M\|_F^2 &= M:M \cr\cr d\,\|M\|_F^2 &= 2\,M:dM \cr }$$ Also, since this question doesn't concern them, you might want to replace the summations over the $b$-coefficients with a single matrix term to reduce some of the clutter, e.g. $$ B = \sum_j b_jK_b^j $$

Update

Thinking about this some more, if you define a new set of matrices $$\eqalign{ M_i = K_a^i\,AB \cr }$$ then you can write the derivative expression as $$\eqalign{ \frac{1}{2} \frac{dJ}{da_i} &= \bigg[\sum_k a_k M_k - U\bigg]:M_i + ca_i \cr &= \sum_k a_k M_k:M_i - U:M_i + ca_i \cr }$$ To simplify things, define a new matrix and vector whose elements are given by $$\eqalign{ X_{ik} &= M_k:M_i \cr v_{i} &= U:M_i \cr }$$ Since the Frobenius product is commutative, the matrix is symmetric.

Setting the derivative to zero, you can write a standard matrix-vector equation $$\eqalign{ (X+cI)\,a &= v \cr }$$ which you can easily solve for $a$.

Big matrix derivative

There are 1 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in MATRIX-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions