Gradient and Hessian of Matrix Least Squares with Squared Frobenius Norm Regularization

Question

Gradient and Hessian of Matrix Least Squares with Squared Frobenius Norm Regularization

548 Views Asked by Bumbble Comm At 16 May 2026 - 5:23

I want to find the Gradeint and Hessian of the following function, $F(\mathbf{S}) = \frac{1}{2}\Vert \mathbf{M} - \mathbf{K_2SK_1^T}\Vert _F^2+\frac{1}{2}\Vert\mathbf{S}\Vert_F^2$. My try: Using trace formula for Frobenius norm, $F(\mathbf{X})$ can be writen as

$Grad(\mathbf{S}) =\frac{\partial}{\partial{S}}f(\mathbf{S})=\mathbf{K_2^T}(\mathbf{K_2SK_1^T}-\mathbf{M})\mathbf{K_1}+\mathbf{S}$
(cookbook eqn [70]&[84],updated accordding to greg's comments)

$Hess(\mathbf{S}) =\frac{\partial}{\partial{S}}Grad(\mathbf{S})=\mathbf{K_1^TK_1K_2^TK_2}+\mathbf{I}$

I got there results according to my understanding of the matrix cookbook. But obviously it's wrong. e.g. :$ K_2(5000,256)$,$ K_1(32,256)$,$ M(5000,1)$, $ S(256,256)$. In Grad(S),the dimensions are not matched for multiplication.

I'm not, particularly, sure about the chain rule in the Grad step whether I can write this. Please help me to get out of this confusion.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Define the trace/Frobenius product as $$A:B={\rm Tr}(A^TB)={\rm Tr}(AB^T)$$ which make the Frobenius norm into a product $$\|X\|_F^2 = X:X$$ Next, define the matrix $$A=K_2SK_1^T-M$$ Write the cost function using the new notation and calculate its gradient. $$\eqalign{ F &= \tfrac{1}{2}S:S + \tfrac{1}{2}A:A \\ dF &= S:dS + A:dA \\ &= S:dS + A:K_2\,dS\,K_1^T \\ &= S:dS + K_2^TAK_1:dS \\ &= (S + K_2^TAK_1):dS \\ \frac{\partial F}{\partial S} &= (S + K_2^TAK_1) \;=\; G\quad\big({\rm the\,gradient}\big) \\ }$$ Flatten the various matrices into vectors. $$\eqalign{ s &= {\rm vec}(S) \\ a &= {\rm vec}(A) = (K_1\otimes K_2)s \\ }$$ Then vectorize $G$ and calculate its gradient (the Hessian). $$\eqalign{ g &= {\rm vec}(S + K_2^TAK_1) \\ &= s + (K_1^T\otimes K_2^T) a&\big({\rm the\,vectorized\,gradient}\big) \\ &= \Big(I + (K_1\otimes K_2)^T(K_1\otimes K_2)\Big)s \\ dg &= \Big(I + (K_1\otimes K_2)^T(K_1\otimes K_2)\Big)ds \\ \frac{\partial g}{\partial s} &= \Big(I + (K_1\otimes K_2)^T(K_1\otimes K_2)\Big) \;=\; H\quad&\big({\rm the\,Hessian}\big) \\ H&\in {\mathbb R}^{256^2\times 256^2} \\ }$$

Update

You can also calculate the Hessian in its un-flattened tensor form $$\eqalign{ dG &= dS + K_2^TdA\,K_1 \\ &= dS + K_2^TK_2\,dS\,K_1^TK_1 \\ &= \Big({\Omega} + K_2^TK_2{\Omega}K_1^TK_1\Big):dS \\ \frac{\partial G}{\partial S} &= \Big({\Omega} + K_2^TK_2{\Omega}K_1^TK_1\Big) \;=\; {\cal H} \\ {\cal H}&\in {\mathbb R}^{256\times 256\times 256\times 256} \\ }$$ where ${\Omega}$ is the fourth-order tensor which acts as the identity for the Frobenius product $$\eqalign{ {\Omega}:S = S:{\Omega} = S \\ }$$ and its components can be expressed in terms of Kronecker delta symbols $$\eqalign{ {\Omega}_{ijk\ell} = \delta_{ik}\delta_{j\ell} }$$ The action of the tensor Hessian on a matrix $B$ is calculated as $$\eqalign{ {\cal H}:B &= ({\Omega} + K_2^TK_2{\Omega}K_1^TK_1):B \\ &= B + K_2^TK_2BK_1^TK_1 \\ \\ }$$ NB: $\;$ The elements of the tensor ${\cal H}$ are exactly the same as those of the matrix $H$, they're just arranged differently. Neither solves your memory issue, since they're both equally huge.

The factored form in the comments does address the storage issue. Each of the three Kronecker factors is a $256\times 256$ matrix.

Gradient and Hessian of Matrix Least Squares with Squared Frobenius Norm Regularization

There are 1 best solutions below

Update

Related Questions in LINEAR-ALGEBRA

Related Questions in MATRICES

Related Questions in DERIVATIVES

Related Questions in NORMED-SPACES

Related Questions in LEAST-SQUARES

Trending Questions

Popular # Hahtags

Popular Questions