Gradient of Square of Quadratic Inner-Product

107 Views Asked by At

\begin{equation} \begin{aligned} f(x) :=& {\langle}x,Ax{\rangle}^2\\ =& x^{T}Axx^{T} Ax\\ & x \in \mathbb{R}^n,\\ & A \in \mathbb{R}^{n \times n}, \;A = A^T. \end{aligned} \end{equation}

The 2nd answer is the generalization + example code to validate.

3

There are 3 best solutions below

3
On BEST ANSWER

You are right, the result is $\nabla f(x) = 4\langle x, Ax\rangle Ax$.

When in doubt, you can just write out the sums explicitly. We have $$\langle x,Ax\rangle =\sum_{i=1}^n x_i(Ax)_i = \sum_{i=1}^n x_i\sum_{j=1}^n A_{ij}x_j = \sum_{i,j=1}^nA_{ij}x_ix_j$$

Therefore \begin{align} \partial_kf(x) &= \partial_k \left(\sum_{i,j=1}^nA_{ij}x_ix_j\right)^2 \\ &= 2\left(\sum_{i,j=1}^nA_{ij}x_ix_j\right)\left(\sum_{i=1}^nA_{ik}x_i + \sum_{j=1}^nA_{kj}x_j\right)\\ &= 2\langle x,Ax\rangle\left(\sum_{i=1}^nA_{ki}x_i + \sum_{j=1}^nA_{kj}x_j\right)\\ &= 4\langle x,Ax\rangle\left(\sum_{j=1}^nA_{kj}x_j\right)\\ &= 4\langle x,Ax\rangle(Ax)_k \end{align}

so $$\nabla f(x) = \begin{bmatrix} \partial_1 f(x) \\ \vdots \\ \partial_n f(x)\end{bmatrix} = 4\langle x,Ax\rangle\begin{bmatrix} (Ax)_1 \\ \vdots \\ (Ax)_n\end{bmatrix} = 4\langle x,Ax\rangle Ax$$


For the Hessian, notice that $$\partial_l\big[\langle x,Ax\rangle\big] = (Ak)_l$$ $$\partial_l \big[(Ax)_k\big] = \partial_l\left(\sum_{j=1}^n A_{kj}x_j\right) = A_{kl}$$ We have \begin{align} \partial_l\partial_k f(x) &= \partial_l\big[4\langle x,Ax\rangle (Ax)_k\big]\\ &= 4\Big[\partial_l\big[\langle x,Ax\rangle\big](Ax)_k + \langle x,Ax\rangle \partial_l \big[(Ax)_k\big]\Big]\\ &= 4\Big[(Ax)_l(Ax)_k + \langle x,Ax\rangle A_{kl}\Big]\\ \end{align}

so the Hessian is given by $$H(x) = \Big[\partial_l\partial_k f(x)\Big]_{1\le l,k\le n} = 4\Big[(Ax)_l(Ax)_k\Big]_{1\le l,k\le n} + 4\langle x,Ax\rangle\Big[ A_{kl}\Big]_{1\le l,k\le n} = 4 A \otimes A + 4\langle x,Ax\rangle A$$

0
On

Here is the generalization and the matlab script that you can see it is correct. Interestingly, it is nothing but chain-rule, we take outer-derivative, and the inner-derivative, and done.

\begin{equation} \begin{aligned} f(x) :=& {\langle}X,AX{\rangle}_F^2\\ =& X^{T}AXX^{T} AX\\ & X \in \mathbb{R}^{n \times r},\\ & A \in \mathbb{R}^{n \times n}, \;\text{not necessarily symmetric}.\\\\ \end{aligned} \end{equation} \begin{equation} \begin{aligned} \text{Define}, \;\; f(X) &:= {\langle}X,AX{\rangle}_F =: g(X),\\ F(X) &\;= {\langle}X,AX{\rangle}_F^2 = f(X)g(X).\\\\ \partial_{X_{ij}}F(X)&=\partial_{X_{ij}}\{f(X)g(X)\}\\ &=\partial_{X_{ij}}\{f(X)\}g(X)+\partial_{X_{ij}}\{g(X)\}f(X)\\ &=2\partial_{X_{ij}}\{f(X)\}f(X)\;\; \because f(X)=g(X).\\\\ & \nabla_{X}f(X)=\nabla_{X}\langle X,AX\rangle=(A+A^T)X\\ \Rightarrow& \partial_{X_{ij}}\{f(X)\}=\Big[\;(A+A^T)X\;\Big]_{ij}\\\\ \therefore \nabla_{X_{}}\{F(X)\} &= 2\langle X,AX \rangle_F \; (A+A^T)X \end{aligned} \end{equation}

==================================================================

Matlab Script: Using Manifold Optimization Toolbox, manopt

result must be...

The slope should be 2. It appears to be: 2.00001. If it is far from 2, then directional derivatives might be erroneous. The residual should be 0, or very close. Residual: 0. If it is far from 0, then the gradient is not in the tangent space.

==================================================================

clc; close all; clear;

addpath(genpath('C:\MatlabToolkits\YALMIP'));

dim_ = randi([1e1,1e2]);

n = dim_; r = floor(dim_./2); A = random('normal', 0, 1, n, n); % A-Asymmetric

problem1 = []; problem1.M = euclideanfactory(n, r); problem1.cost = @(X) power((trace(X'AX)),2); problem1.grad = @(X) 2*trace(X'AX)*(A+A')*X;

pltcfg.decay = 0.75; fprintf('========================================================== @k\n') figure('Position', [0, 0, 1920*pltcfg.decay, 1080*pltcfg.decay]); checkgradient(problem1); fprintf('\n\n');

1
On

NB:   If $A$ is not symmetric, then replace it with $\Big(\tfrac{A+A^T}{2}\Big)$ in the following.

Define the scalar $$\eqalign{ \phi &= x^TAx \cr d\phi &= (2Ax)^Tdx \cr }$$ Write the function in terms of this new variable. Then find its differential and gradient. $$\eqalign{ f &= \phi^2 \cr df &= 2\phi\,d\phi = (4\phi Ax)^Tdx \cr g = \frac{\partial f}{\partial x} &= 4\phi Ax \cr\cr }$$ Now find the differential of the gradient, and thence the hessian. $$\eqalign{ dg &= 4\phi A\,dx + 4Ax\,d\phi \cr &= 4\phi A\,dx + (4Ax)(2Ax)^Tdx \cr H=\frac{\partial g}{\partial x} &= 4\phi A + 8Axx^TA \cr }$$