Derivative of a composite function with respect to a matrix

402 Views Asked by At

I would like some hints on how to derive the following function with respect to the matrix $X$:

$$ f(X)=\left\| y - \sum_k (h_k X^k)x \right\|_2^2$$

where $X$ is a matrix, $x$ and $y$ are two given vectors and $h_k$ is a (given) scalar coefficient that weights the matrix. This is the composition of a convex function (the norm) with a non affine function. I want to linearize the function with respect to X, so I would like to compute the derivative. Some hints?

Thank you.

2

There are 2 best solutions below

12
On BEST ANSWER

Let's use a convention where a greek letter denotes a scalar, lowercase latin a vector, and uppercase latin a matrix. So replace your original variable names with $$\eqalign{ \alpha_k = h_k ,\qquad \phi(X) = f(X) ,\qquad x = z }$$ and define two new matrices which will be used later $$\eqalign{ P &= \sum_{k=1}^n \alpha_kX^k \\ M &= 2z(Pz-y)^T\\ }$$ Let's also use a colon to denote the matrix inner product, i.e. $$\eqalign{ A:B &= {\rm Tr}(A^TB) \\ A:A &= \big\|A\big\|_F^2 \\ }$$ Write the objective function in terms of the above definitions.
Then calculate its differential and gradient. $$\eqalign{ \phi &= (Pz-y):(Pz-y) \\ d\phi &= 2(Pz-y):dP\,z \\ &= 2(Pz-y)z^T:dP \\ &= M^T:\sum_{k=1}^n \alpha_k\sum_{j=0}^{k-1} X^jdX\,X^{k-j-1} \\ &= \sum_{k=1}^n\sum_{j=0}^{k-1}\left(\alpha_kX^jMX^{k-j-1}\right)^T:dX \\ \frac{\partial \phi}{\partial X} &= \sum_{k=1}^n\sum_{j=0}^{k-1}\left(\alpha_kX^jMX^{k-j-1}\right)^T \\ }$$

0
On

If you are just trying to linearize $f(X)$ around $X= 0$, this is easily accomplished

$$f(X) =\left(y - \sum_k h_k X^kx\right)^T\left(y - \sum_k h_k X^kx\right)\\ =\left(y^T - \sum_k h_kx^T(X^T)^k\right)^T\left(y - \sum_k h_k X^kx\right)\\ =y^Ty - \sum_k h_ky^TX^kx - \sum_k h_kx^T(X^T)^ky + \sum_j\sum_k h_jh_k x^T(X^T)^jX^kx\\ = y^Ty + h_0^2x^Tx - h_0(y^Tx + x^Ty) - h_1(y^TXx + x^TX^Ty) + 2h_0h_1x^TXx + O(X^2)$$

Linearization just means ignoring higher order terms, so the linearization for $X$ near $0$ is $$y^Ty + h_0^2x^Tx - h_0(y^Tx + x^Ty) - h_1(y^TXx + x^TX^Ty) + 2h_0h_1x^TXx$$ If there are no terms for $k = 0$, it simplifies to $$y^Ty - h_1(y^TXx + x^TX^Ty)$$