Let $\pmb{A}$ be a real matrix and $\pmb{x}$ a vector such that $\pmb{A}^T\pmb{x}$ exists. Then how do I calculate the following result? $$\frac{\partial}{\partial \pmb{A} } \pmb{A}^T\pmb{x} = ?$$ Any help would be appreciated as I haven't been able to find any information on this.
Derivative of $\pmb{A}^T\pmb{x}$ with respect to $\pmb{A}$
244 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 4 best solutions below
On
Let $\mathbf{f(A)}=A^T\mathbf{x}, \mathbf{x} \in \mathbb{R}^n$ be a vector-valued function, then it will have $n$ components. Each component has the formula:
$$f_i(A)=\langle \mathbf{a}_i,\mathbf{x}\rangle$$
Where $\mathbf{a}_i$ is the $i$th column vector of $A$. The derivative of $f_i(A)$ wrt $A$ is the matrix given by:
$$\frac{d}{dA}f_i(A)=\left(\frac{df_i}{da_{ij}}\right)_{i,j \in 1...n}$$
Therefore, you will be calculating a series of $n$ matrices. Note that the linearity of $f_i$ means that only one column will contain non-zero entries.
On
Define the function $F(A,x)=A^{\perp}x$. This is a function of $A\in M^{n\times n}$ and $x \in \mathbb{R}^{n}$. Specifically, $$ F : M^{n\times n} \times \mathbb{R}^{n} \rightarrow \mathbb{R}^{n}. $$ To take the derivative of $F$ with respect to $A$, you allow $A$ to vary by adding a (small) $h \in M^{n\times n}$ and find a linear function $L : M^{n\times n}\rightarrow \mathbb{R}^{n}$ such that $$ F(A+h,x)-F(A,x) = Lh+o(h), $$ where $L$ depends on $A$, $x$ and $o$ means $o(h)/|h|\rightarrow 0$ as $|h|\rightarrow 0$. In this case, $$ F(A+h,x)-F(A,x)=h^{\perp} x+0. $$ The linear function $L$ is $Lh = h^{\perp}x$. It looks wrong because we're used to writing everything in terms of a matrix for a linear function. You can do that by choosing a basis of $M^{n\times n}$, or you can recognize that the following defines a linear function of $h$ for fixed $A$, $x$. $$ L(A,x)h = h^{\perp} x $$ So the linearization of $F$--which is the derivative--is the linear function $$ L(A,x) : M^{n\times n}\rightarrow \mathbb{R}^{n} $$ defined by $L(A,x)h = h^{\perp}x$. The derivative is independent of $A$, but does depend on $x$.
Is a matrix form acceptable as an answer?
As I said in my comment, I know nothing about tensor notation. What I usually do to avoid it is vectorizing something. Here what you can do to stay in the linear algebra world is take the derivative with respect to a vectorized version of your matrix:
$$\frac{\partial}{\partial \operatorname{vec} (A) } A^T x$$
Often when you do this kind of thing, Kronecker product will be your friend. It happens for a $n \times m$ matrix $A$ that $$A^Tx = (I_m \otimes x)^T \operatorname{vec} (A)$$
Sorry, no proof, check it with Matlab or Octave with the
kronfunction.Now that you have this the derivative reduces to $$\frac{\partial}{\partial a } X a$$ where $a$ is $\operatorname{vec} (A)$ and $X=(I_m \otimes x)^T$. And this is actually equal to $X$ (or its transpose, I don't know the convention).
Long story short:
$$\frac{\partial}{\partial \operatorname{vec} (A) } A^T x = (I_m \otimes x)^T$$