I would like some hints on how to derive the following function with respect to the matrix $X$:
$$ f(X)=\left\| y - \sum_k (h_k X^k)x \right\|_2^2$$
where $X$ is a matrix, $x$ and $y$ are two given vectors and $h_k$ is a (given) scalar coefficient that weights the matrix. This is the composition of a convex function (the norm) with a non affine function. I want to linearize the function with respect to X, so I would like to compute the derivative. Some hints?
Thank you.
Let's use a convention where a greek letter denotes a scalar, lowercase latin a vector, and uppercase latin a matrix. So replace your original variable names with $$\eqalign{ \alpha_k = h_k ,\qquad \phi(X) = f(X) ,\qquad x = z }$$ and define two new matrices which will be used later $$\eqalign{ P &= \sum_{k=1}^n \alpha_kX^k \\ M &= 2z(Pz-y)^T\\ }$$ Let's also use a colon to denote the matrix inner product, i.e. $$\eqalign{ A:B &= {\rm Tr}(A^TB) \\ A:A &= \big\|A\big\|_F^2 \\ }$$ Write the objective function in terms of the above definitions.
Then calculate its differential and gradient. $$\eqalign{ \phi &= (Pz-y):(Pz-y) \\ d\phi &= 2(Pz-y):dP\,z \\ &= 2(Pz-y)z^T:dP \\ &= M^T:\sum_{k=1}^n \alpha_k\sum_{j=0}^{k-1} X^jdX\,X^{k-j-1} \\ &= \sum_{k=1}^n\sum_{j=0}^{k-1}\left(\alpha_kX^jMX^{k-j-1}\right)^T:dX \\ \frac{\partial \phi}{\partial X} &= \sum_{k=1}^n\sum_{j=0}^{k-1}\left(\alpha_kX^jMX^{k-j-1}\right)^T \\ }$$