How to take the derivative of $ y = xA^T + b$ w.r.t. $A$ and $b$, where $x,b$ are vectors and $A$ is a matrix

232 Views Asked by At

How exactly could I take the derivative of the following expression?

$$ y = xA^T + b$$

Let's say that I have $x \in \mathbb{R}^{n}$, $A \in \mathbb{R}^{m,n}$, $y \in \mathbb{R}^{m}$, and $b \in \mathbb{R}^{m}$. And, I wish to take the derivative of $y$ with respect to $A$ and $b$, i.e. $\frac{\partial y}{\partial A}$ and $\frac{\partial y}{\partial b}$. I understand that $\frac{\partial y}{\partial A}$ would a rank-3 tensor containing $\frac{\partial y_i}{\partial A_{jk}}$, although I'm not entirely sure how to get to the solution. I've tried looking through the matrix cookbook but the only other solution (that I can find at least) is for $\frac{\partial x^Ta}{\partial x}$ where $x$ and $a$ in this case are both vectors. So, I'm a little confused!

With regards to the second term, $\frac{\partial y}{\partial b}$, I would assume that this is just the identity matrix ($\mathbb{I} \in \mathbb{R}^{m \times m}$) as the terms is just element-wise addition?

Thank you in advance!

1

There are 1 best solutions below

5
On BEST ANSWER

$\def\E{{\cal E}}\def\p#1#2{\frac{\partial #1}{\partial #2}}\def\v#1{\operatorname{vec}(#1)}$Doing the calculation using index notation (i.e. element-wise) is your best option $$\eqalign{ y_i &= A_{ij}x_j + b_i \\ \p{y_i}{b_k} &= \p{b_i}{b_k} = \delta_{ik} \quad\implies \p{b}{b} = I \\ \p{y_i}{A_{k\ell}} &= \p{A_{ij}}{A_{k\ell}}x_j = \delta_{ik}\delta_{j\ell}\;x_j = \delta_{ik}\,x_\ell \\ }$$ You could also vectorize the equation using Kronecker products $$\eqalign{ a &\doteq \v{A} \\ y &= (x\otimes I) a + b \\ \p{y}{a} &= (x\otimes I) \p{a}{a} = (x\otimes I) I = (x\otimes I) \\ }$$ Or you could use indexed matrices (sort of a half-index notation) $$\eqalign{ y &= Ax + b \\ \p{y}{A_{jk}} &= \left(\p{A}{A_{jk}}\right)x = E_{jk}\,x \\ }$$ where $E_{jk}$ is a matrix containing all zeros except for a single ${\tt1}$ at the $(j,k)$ element.

Or you can go into full-tensor mode by giving a name $(\E)$ to the fourth-order tensor that we encountered using index notation, i.e. $$\eqalign{ \E_{ijk\ell} &\doteq \delta_{ik}\delta_{j\ell} \quad\implies\quad \p{y}{A} &= \E x \\ }$$