Matrix Chain Rule Derivative Question

108 Views Asked by At

Matrix Partial Derivative and Chain Rule

I don't know how the partial derivative is taken here, and was wondering if all of the steps can be spelled out. The image is linked.

Specific points of confusion are:

  1. Why $V_j^T x$ is taken inside $g'$ -- I know that we're dealing with the jth row of the $V$ matrix, but once we've identified the jth row, why is it necessary to transpose? I thought $V_j$ is a row and $x \in \R^n$
  2. Why is $V_{ji}$ multiplied to the end of the expression, instead of $V_{ji}^T$? I thought the derivative of $Vx$ is $V^T$
1

There are 1 best solutions below

0
On

$\def\c#1{\color{red}{#1}}$Define the following vectors $$\eqalign{ w &= Vx, \qquad h &= g(w), \qquad h' &= g'(w) \\ }$$ where $g$ is a scalar function, $g'$ is its derivative, and the vectors $(h,h')$ are the result of applying these scalar functions element-wise to the $w$ vector. Because of the element-wise nature of the functions, the differential of $h$ requires a Hadamard (element-wise) product, i.e. $$\eqalign{ dh &= h'\odot dw \\ &= {\rm Diag}(h')\, dw \\ &= H'\, dw \\ }$$ The last line uses the well known "trick" of replacing a Hadamard product by a diagonal matrix.

With the above notation, calculating the gradient of $h$ with respect to $x$ is almost trivial. $$\eqalign{ dh &= H'\,\c{dw} \;=\; H'\,\c{V\,dx} \\ \frac{\partial h}{\partial x} &= H'\,V \;\;\doteq\;\; Dh(x) \\ &= \big(h'{\tt1}^T\big)\odot V &\quad\big({\rm Hadamard\,equivalent}\big) \\ }$$ The answer given in your book is the author's (poor) attempt to write this result using neither the Diag() function nor the Hadamard product.