How can I calculate the partial derivative $\frac{\partial}{\partial \vec{x}} f\left(A\vec{x} + \vec{b}\right)$ using matrix calculus?

138 Views Asked by At

I want to calculate $\frac{\partial}{\partial \vec{x}} f\left(A\vec{x} + \vec{b}\right)$ where $\vec{x}, \vec{b} \in \mathbb{R^n}$ and $f: \mathbb{R}^n \rightarrow \mathbb{R}^n$ which is applied element-wise. Is it correct that since both $\vec{x}$ and $f\left(A\vec{x} + \vec{b}\right)$ are vectors the partial derivative must be a Jacobian matrix? I tried applying the chain rule and various identities I have found on Wikipedia, but I am very unsure about the result: $$\frac{\partial}{\partial \vec{x}} f\left(A\vec{x} + \vec{b}\right) = \frac{\partial}{\partial \vec{x}} \left(A\vec{x} + \vec{b}\right)\operatorname{diag}\left(f'\left(A\vec{x} + \vec{b}\right)\right) = A\operatorname{diag}\left(f'\left(A\vec{x} + \vec{b}\right)\right)$$

4

There are 4 best solutions below

0
On BEST ANSWER

It will be in Jacobian form:

$\frac{\delta (f_i)}{\delta(x_j)}(Ax+b) * A = Jf(Ax+b)*A$

where $f(x_1,...,x_n) = (f_1(x_1,...,x_n),...f_n(x_1,...,x_n))$

0
On

What is your definition of $\frac{\partial}{\partial \vec x}$?

If we use the definition $$ \frac{\partial g }{\partial \vec x}(\vec a) = \lim_{h \to 0} \frac{g(a+h \vec x)-g(a)}{h}, $$ (for a function $g:\mathbb R^n \to \mathbb R$), then the usual rules of limits should give the answer $A$.

0
On

Yes, it is a Jacobian matrix. When I get confused with vectors/tensors ("was there a transpose?"), I always start writing down a couple of components and then try to infer what the general expression is.

If you tried but still can't figure it out, read further. Let's compute the derivative of the $i$-th component of $f$ with respect of the $j$-th componente of $x$. To do this, don't forget that every component of $Ax+b$ depends on $x_j$, so we need to use the chain rule $n$ times.

$$ \partial_{x_j}(f_i(Ax+b)) =\\ (\partial_{x_1}f_i)(Ax+b)\cdot \partial_{x_j}(Ax+b)_1+\cdots+(\partial_{x_n}f_i)(Ax+b)\cdot \partial_{x_j}(Ax+b)_n\\ =(\partial_{x_1}f_i)(Ax+b)\cdot \partial_{x_j}a_{1j}+\cdots+(\partial_{x_n}f_i)(Ax+b)\cdot \partial_{x_j}a_{nj}\\ =(\nabla f_i)(Ax+b)\cdot a_j $$ where $a_j$ is the $j$-th column of $A$. So how does this read when we put all the components together? The entry $J_{ij}$ of the Jacobian is what we just computed. Well, $J_{ij}$ is the product of the $i$-th row of $J_f$ with the $j$-th column of $A$. Therefore, denoting products with $*$ to not confuse with function evaluation, $$ J_{f(Ax+b)} = J_f(Ax+b)*A. $$

0
On

I'll assume that you have a specific function in mind, for which you know the derivative, $$\eqalign{ \frac{df(z)}{dz} &= g(z) \cr df &= g\,dz \cr }$$ When this function is applied element-wise to a vector argument, the Hadamard ($\circ$) Product must be used in the differential $$\eqalign{ df &= g\circ dz \cr }$$ For this problem, define $$\eqalign{ z &= Ax + b \cr dz &= A\,dx \cr }$$ then the differential of your function is $$\eqalign{ df &= g\circ dz \cr &= G\,dz \cr &= GA\,dx \cr \frac{\partial f}{\partial x} &= GA \cr }$$ where the matrix $\,G = {\rm Diag}(g)$