Derivative of squared Frobenius norm of linear combination of rank-$1$ matrices with respect to their weights

231 Views Asked by At

Let

$${\bf A} := \sum_{i=1}^K w_i {\bf x}_i {\bf x}_i^\top$$

where $\{w_i\}_{i=1}^K$ are scalars, and ${\bf x}_i \in \mathbb{R}^d$. Let ${\bf w} := [w_1 \cdots w_K]^\top \in \mathbb{R}^K$.

What is the derivative of the squared Frobenius norm $\|{\bf A}\|_\mathsf{F}^2$ with respect to ${\bf w}$?

2

There are 2 best solutions below

3
On

According to the definition, $$ \left\|A\right\|_F^2=\text{tr}\left(A^{\top}A\right). $$ Thus since we have $$ A=\sum_iw_i\mathbf{x}_i\mathbf{x}_i^{\top}, $$ we obtain $$ A^{\top}A=\left(\sum_iw_i\mathbf{x}_i\mathbf{x}_i^{\top}\right)\left(\sum_jw_j\mathbf{x}_j\mathbf{x}_j^{\top}\right)=\sum_{i,j}w_iw_j\mathbf{x}_i\mathbf{x}_i^{\top}\mathbf{x}_j\mathbf{x}_j^{\top}. $$ Consequently, $$ \left\|A\right\|_F^2=\text{tr}\left(A^{\top}A\right)=\text{tr}\left(\sum_{i,j}w_iw_j\mathbf{x}_i\mathbf{x}_i^{\top}\mathbf{x}_j\mathbf{x}_j^{\top}\right)=\sum_{i,j}w_iw_j\text{tr}\left(\mathbf{x}_i\mathbf{x}_i^{\top}\mathbf{x}_j\mathbf{x}_j^{\top}\right). $$ It is then straightforward to see that $$ \frac{\partial}{\partial w_k}\left\|A\right\|_F^2=\sum_{i,j}\left(\delta_{ik}w_j+w_i\delta_{ij}\right)\text{tr}\left(\mathbf{x}_i\mathbf{x}_i^{\top}\mathbf{x}_j\mathbf{x}_j^{\top}\right)=2\sum_{j}w_j\text{tr}\left(\mathbf{x}_j\mathbf{x}_j^{\top}\mathbf{x}_k\mathbf{x}_k^{\top}\right). $$

Edit: Further simplification

Thanks to @RodrigodeAzevedo's suggestion, the last expression could be further simplified. Note that $$ \text{tr}\left(\mathbf{x}_j\mathbf{x}_j^{\top}\mathbf{x}_k\mathbf{x}_k^{\top}\right)=\text{tr}\left(\mathbf{x}_k^{\top}\mathbf{x}_j\mathbf{x}_j^{\top}\mathbf{x}_k\right)=\text{tr}\left(\left|\mathbf{x}_j\cdot\mathbf{x}_k\right|^2\right)=\left|\mathbf{x}_j\cdot\mathbf{x}_k\right|^2. $$ Thus we have $$ \frac{\partial}{\partial w_k}\left\|A\right\|_F^2=2\sum_{j}w_j\left|\mathbf{x}_j\cdot\mathbf{x}_k\right|^2. $$

0
On

Define the matrices $$\eqalign{ X &= [\,x_1\,x_2\,\ldots\,x_K] &\implies x_i = Xe_i \cr W &= {\rm Diag}(w) &\implies w = {\rm diag}(W) \cr }$$ Then we can solve the problem in matrix form $$\eqalign{ A &= XWX^T \cr \phi &= \|A\|_F^2 = A:A \cr d\phi &= 2A:dA = 2A:(X\,dW\,X^T) = 2X^TAX:dW = 2\,{\rm diag}(X^TAX)^Tdw \cr \frac{\partial\phi}{\partial w} &= 2\,{\rm diag}(X^TAX) = 2\,{\rm diag}(X^TXWX^TX) = 2\,(X^TX\odot X^TX)\,w \cr\cr }$$ In some of the steps above, the elementwise/Hadamard product was denoted by $A\odot B$, and the trace/Frobenius product by $A:B = {\rm tr}(A^TB)$.