Differential of a matrix function: $f:\mathbb{R}^{n \times m} \rightarrow \mathbb{R}^{m \times 1}$ $f(A) = A^T\cdot \vec{v}$

82 Views Asked by At

I wish to calculate the differential of a function: $f(A) = X^T\cdot \vec{v}$ when $A\in \mathbb{R}^{n \times m}$ with respect to $A$.

Since this is a linear function, if we think about $D\in \mathbb{R}^{n\times m}$ as a direction we suppose to get $f(A) + \nabla_A f \cdot D = f(A+D)$ since there are no non-linear terms. If we think about $f(A+D), f(A)\in\mathbb{R}^{m\times 1}, D\in \mathbb{R}^{(n\times m) \times 1}$ we learn that $\nabla f_A \in \mathbb{R}^{m\times (n\times m)}$. Thus $\nabla_A f\cdot D\in \mathbb{R}^{m\times 1}$.

My question though is how should we define the multiplication $\mathbb{R}^{m \times (n \times m)} \cdot \mathbb{R}^{(n\times m)\times 1}$?. we need kind of a matrix multiplication which results in a scalar i.e $(n\times m) \oplus (n\times m) \in \mathbb{R}$. But what this operation needs to be? and how this operation represent the idea of differensial?

I put here a related question, asked 5 years ago which was not answered. Differntiating matrix functions $f : \mathbb R^{n\times m} \to \mathbb R^{p\times q}$

2

There are 2 best solutions below

0
On BEST ANSWER

Start with the transpose of your function, and with the help of the Kronecker product, vectorize it to obtain a linear equation whose gradient is trivial to calculate. $$\eqalign{ {\rm vec}(f^T) &= {\rm vec}(v^TA) \\ &= \left(I_m\otimes v^T\right){\rm vec}(A) \\ f &= \left(I_m\otimes v^T\right)a \\ df &= \left(I_m\otimes v^T\right)da \\ \frac{\partial f}{\partial a} &= \left(I_m\otimes v^T\right) \;=\; G \quad&({\rm the\,gradient\,matrix}) \\ }$$ The index mapping between $a$ and $A$ components is tedious but straightforward $$\eqalign{ A &\in {\mathbb R}^{n\times m} \implies a \in {\mathbb R}^{mn\times 1} \\ A_{ij} &= a_k \\ k &= i+(j-1)\,n \\ i &= 1+(k-1)\,{\rm mod}\,n \\ j &= 1+(k-1)\,{\rm div}\,n \\ }$$ and can be used to calculate the components of the 3rd-order gradient tensor
$$\eqalign{ \Gamma_{pij} &= \frac{\partial f_p}{\partial A_{ij}} &= \frac{\partial f_p}{\partial a_k} &= G_{pk} \\ }$$ The derivative formula $\big($in the direction of $D\,\big)$ that you are seeking is $$\eqalign{ df &= f(A+D)-f(A) \\ &= \Gamma:D &({\rm in\,product\,form}) \\ df_{p} &= \Gamma_{pij}\,D_{ij} &({\rm in\,component\,form}) \\ }$$ where the colon denotes the double-dot product $\big($and assuming $\|D\|_F^2\ll 1\big)$.

0
On

I'll use the terminology that I applied to your old question here.

Let $f_i$ denote the function whose output is the $i$th entry of $f(A)$. Let $e_i$ denote the $i$th standard basis vector. We have $$ f_i(A) = e_i^TA^Tv = \operatorname{tr}([ve_i^T]^TA) \implies\\ df_i(A)(H) = \operatorname{tr}([ve_i^T]^TH), \quad \frac{\partial f_i}{\partial A} = v e_i^T. $$ So, the directional derivative of $f_i$ along $D$ will be given by $$ \operatorname{tr}([v e_i^T]^TD) = v^TD e_i = e_i^TD^Tv = \sum_{k=1}^n v_k d_{ki}. $$

In other words, we should find in the end that $$ \nabla_A f \cdot D = \pmatrix{\sum_{k=1}^n v_k d_{k1}\\ \vdots \\ \sum_{k=1}^n v_k d_{kn}} = D^T v. $$ That said, this derivation doesn't directly tell us how to generally multiply arrays of the shape.


Extending the above logic, if the "frontal slice" $[\frac{\partial f}{\partial A}_{j,k,i}]_{j,k=1}^{n,m}$ is the denominator-form derivative of $f_i:\Bbb R^{n \times m} \to \Bbb R^n$, then the "directional derivative" along $D$ is given by $$ [\frac{\partial f}{\partial A} \cdot D]_{i} = \sum_{j=1}^n \sum_{k=1}^m \left[\frac{\partial f}{\partial A}\right]_{jki} d_{jk}. $$