Partial derivative of a function with respect to a Matrix

188 Views Asked by At

I'm currently trying to solve the following problem, and I'm stuck:

Suppose we have the following function $J = (E\left [\|Ax + b - f(x) \|_2^2\right])^{\frac{1}{2}}$ where $A \in \mathbb{R}^{m, n}$, $b \in \mathbb{R}^m$, $f: \mathbb{R}^m \mapsto \mathbb{R}^m$, and $x \in \mathbb{R}^m$ is a random variable distributed according to a distribution $P$ which is not further specified. $E$ denotes the expectation operator.

I'm trying to calculate the derivative of $J$ with respect to $A$, that is $\frac{\partial J}{\partial A}$.

I know that the result has to be $\frac{\partial J}{\partial A} = 2E\left [(Ax + b - f(x))x^T \right ]$, but I don't know how to get there. What I tried was applying the chain rule:

$u = Ax + b - f(x) \Rightarrow \frac{\partial u}{\partial A} = x^T$

$v = \|u \|_2^2 \Rightarrow \frac{\partial v}{\partial u} = 2 u$

$w = E\left [v\right ] \Rightarrow \frac{\partial w}{\partial v} = E\left [\frac{\partial}{\partial v}v \right ] = E \left [I \right ] = I$ (identity)

$k = w^{\frac{1}{2}} \Rightarrow \frac{\partial k}{\partial w} = \frac{1}{2} w^{-\frac{1}{2}}$

Putting everything together: $\frac{\partial k}{\partial w}\frac{\partial w}{\partial v}\frac{\partial v}{\partial u}\frac{\partial u}{\partial A} = \frac{1}{2} (E\left [\|Ax + b - f(x) \right \|_2^2])^{-\frac{1}{2}}2(Ax + b - f(x))x^T$

which is obviously not the same as the desired result. If someone could point me to my mistake, I would greatly appreciate it.

Thanks!

1

There are 1 best solutions below

2
On BEST ANSWER

Define a new vector variable $$\eqalign{ v &= Ax+b-f \cr }$$ Then $$\eqalign{ J^2 &= E\,[v^Tv] \cr }$$ Taking differentials $$\eqalign{ dJ^2 &= E\,[d(v^Tv)] \cr &= E\,[2v^Tdv] \cr\cr 2J\,dJ &= 2E\,[v^Tdv] \cr &= 2E\,[v^TdA\,x] \cr &= 2E\,[vx^T:dA] \cr &= 2E\,[vx^T]:dA \cr\cr \frac{\partial J}{\partial A} &= \frac{E\,[vx^T]}{J} = \frac{E\,[(Ax+b-f)x^T]}{J} \cr\cr }$$ where colon denotes the Frobenius Inner Product.

In order to get your "known" result, get rid of the square-root in the definition of $J$, in which case the derivation becomes $$\eqalign{ J &= E\,[v^Tv] \cr dJ &= 2E\,[vx^T]:dA \cr \frac{\partial J}{\partial A} &= 2E\,[vx^T] }$$