I wish to calculate the differential of a function: $f(A) = X^T\cdot \vec{v}$ when $A\in \mathbb{R}^{n \times m}$ with respect to $A$.
Since this is a linear function, if we think about $D\in \mathbb{R}^{n\times m}$ as a direction we suppose to get $f(A) + \nabla_A f \cdot D = f(A+D)$ since there are no non-linear terms. If we think about $f(A+D), f(A)\in\mathbb{R}^{m\times 1}, D\in \mathbb{R}^{(n\times m) \times 1}$ we learn that $\nabla f_A \in \mathbb{R}^{m\times (n\times m)}$. Thus $\nabla_A f\cdot D\in \mathbb{R}^{m\times 1}$.
My question though is how should we define the multiplication $\mathbb{R}^{m \times (n \times m)} \cdot \mathbb{R}^{(n\times m)\times 1}$?. we need kind of a matrix multiplication which results in a scalar i.e $(n\times m) \oplus (n\times m) \in \mathbb{R}$. But what this operation needs to be? and how this operation represent the idea of differensial?
I put here a related question, asked 5 years ago which was not answered. Differntiating matrix functions $f : \mathbb R^{n\times m} \to \mathbb R^{p\times q}$
Start with the transpose of your function, and with the help of the Kronecker product, vectorize it to obtain a linear equation whose gradient is trivial to calculate. $$\eqalign{ {\rm vec}(f^T) &= {\rm vec}(v^TA) \\ &= \left(I_m\otimes v^T\right){\rm vec}(A) \\ f &= \left(I_m\otimes v^T\right)a \\ df &= \left(I_m\otimes v^T\right)da \\ \frac{\partial f}{\partial a} &= \left(I_m\otimes v^T\right) \;=\; G \quad&({\rm the\,gradient\,matrix}) \\ }$$ The index mapping between $a$ and $A$ components is tedious but straightforward $$\eqalign{ A &\in {\mathbb R}^{n\times m} \implies a \in {\mathbb R}^{mn\times 1} \\ A_{ij} &= a_k \\ k &= i+(j-1)\,n \\ i &= 1+(k-1)\,{\rm mod}\,n \\ j &= 1+(k-1)\,{\rm div}\,n \\ }$$ and can be used to calculate the components of the 3rd-order gradient tensor
$$\eqalign{ \Gamma_{pij} &= \frac{\partial f_p}{\partial A_{ij}} &= \frac{\partial f_p}{\partial a_k} &= G_{pk} \\ }$$ The derivative formula $\big($in the direction of $D\,\big)$ that you are seeking is $$\eqalign{ df &= f(A+D)-f(A) \\ &= \Gamma:D &({\rm in\,product\,form}) \\ df_{p} &= \Gamma_{pij}\,D_{ij} &({\rm in\,component\,form}) \\ }$$ where the colon denotes the double-dot product $\big($and assuming $\|D\|_F^2\ll 1\big)$.