Gradient of a vector-valued function with respect to a matrix domain

56 Views Asked by At

I have a need to calculate the gradient of the following function $f:\mathbb{R}^{k\times m}\to\mathbb{R}^k$ given by $$f(A)=Ax,$$ for some fixed $x\in\mathbb{R}^m$. Using coordinates $A=A^i_jE_i^j, x=x^je_j$, we have that $$f^\lambda(A^i_j)=A^\lambda_jx^j,$$ and trivially $$\frac{\partial f^\lambda}{\partial A^\mu_\nu}=\frac{\partial}{\partial A^\mu_\nu}(A^\lambda_jx^j)=\delta^\lambda_\mu\delta_j^\nu x^j=\delta^\lambda_\mu x^\nu.$$ However, I'm not sure how to interpret the differential correctly. If we look at this slightly more abstractly, with the differential $df:T\mathbb{R}^{k\times m}\to T\mathbb{R}^k$, we see that for $v\in T\mathbb{R}^{k\times m}$ and $g\in C^\infty(\mathbb{R}^k)$ that \begin{align} df(v)[g]&=v[g\circ f]\\ &=v^i_j\frac{\partial g}{\partial y^\lambda}\frac{\partial f^\lambda}{\partial A^i_j} \end{align} and so \begin{align} df(v)&=v^i_j\frac{\partial f^\lambda}{\partial A^i_j}\frac{\partial}{\partial y^\lambda}\\ &=v^i_j\delta^\lambda_ix^j\frac{\partial}{\partial y^\lambda}\\ &=v^\lambda_jx^j\frac{\partial}{\partial y^\lambda}. \end{align}

Under more ideal settings, as with the case when $k=1$, we would define $\text{grad}(f)$ to be the unique vector field such that $$g(\text{grad}(f),v))=df(v)$$ for any $v\in T\mathbb{R}^{k\times m}$ where $g$ is the Frobenius inner product on $k\times m$-matrices (i.e., $g(A,B)=\text{tr}(A^TB)$) and we would conclude from the above calculations that $$\text{grad}(f)=x.$$ This however doesn't seem to generalize in any obvious way (to me at least). Am I missing something (or have an error) in the above computation? If not, what would we conclude is the gradient of $f$?

1

There are 1 best solutions below

0
On

One possible solution is to consider the $k$th component of $f$ denoted $f_k=\mathbf{e}_k: \mathbf{Ax}$

It follows that the gradient of this component is the matrix $$ \frac{\partial f_k}{\partial \mathbf{A}} = \mathbf{e}_k \mathbf{x}^T $$ Indeed $f_k$ is only sensitive to elements in the $k$th row of $\mathbf{A}$.