Differentiation of Matrices

434 Views Asked by At

I’ll appreciate any help on this.

Given $\mathbf{X}\in {{R}^{m\times n}}$ and $\mathbf{a,b}\in {{R}^{m\times 1}}$, consider $f(\mathbf{{{X}^{T}}a})$ to be an element-wise operation, i.e., \begin{equation} f( \mathbf{{{X}^{T}}a})=\left[ \begin{matrix} f( {{v}_{1}}) \\ \ldots \\ f\left( {{v}_{n}} \right) \\ \end{matrix} \right]=\mathbf{w} \end{equation}

where $\mathbf{{{X}^{T}}a=v}$ and $f$ is a sigmoid function defined as $f\left( x \right)=1/\left( 1+{{e}^{-x}} \right)$. What is $\partial \mathbf{w^T w} / \partial \mathbf{X}$?

2

There are 2 best solutions below

2
On BEST ANSWER

As Erik Miehling has helpfully pointed out, you would require the product and chain rule.

We have $\mathbf{w}^T\mathbf{w}=\sum_{k=1}^nf(v_k)^2$

Therefore, using the chain rule

$$\frac{d\mathbf{w}^T\mathbf{w}}{dv_g}=2f(v_g)\frac{df(v_g)}{dv_g}$$

We are given that $f(x)=1/(1+e^{-x})$, thus

$$\frac{df(v_g)}{dv_g}=\frac{e^{-v_g}}{(1+e^{-v_g})^2}$$

Thus

$$\frac{d\mathbf{w}^T\mathbf{w}}{dv_g}=2f(v_g)\frac{e^{-v_g}}{(1+e^{-v_g})^2}$$

Now scalar $v_g$ is the product of row $g$ of matrix $\mathbf{X}^T$ and column vector $\mathbf{a}$, i.e.

$$v_g=\sum_{k=1}^mX_{k,g}a_k$$

where $X_{k,g}$ is the element of matrix $\mathbf{X}$ in row $k\in \{1,2,..,m\}$ and column $g\in \{1,2,..,n\}$.

Thus for $f\in \{1,2,..,m\}$

$$\frac{dv_g}{dX_{f,g}}=a_f$$

Therefore,

$$\frac{d\mathbf{w}^T\mathbf{w}}{dX_{f,g}}=\frac{d\mathbf{w}^T\mathbf{w}}{dv_g}\frac{dv_g}{dX_{f,g}}=2f(v_g)\frac{e^{-v_g}}{(1+e^{-v_g})^2}a_f$$

Having dealt with a single element of matrix $\mathbf{X}$, we can extend the derivative to the entire matrix as follows:-

$$\frac{d\mathbf{w}^T\mathbf{w}}{d\mathbf{X}}=\mathbf{AD}$$

where $\mathbf{A}$ is an $m\times n$ matrix with $n$ identical columns - each column containing $m$ rows, $[a_1,a_2,\cdots,a_m]$, thus

$$\mathbf{A}=\left( \begin{array}{cccc} a_1 & a_1 & \cdots & a_1 \\ a_2 & a_2 & \cdots & a_2 \\ a_3 & a_3 & \cdots & a_3 \\ \vdots & \vdots & \ddots & \vdots \\ a_m & a_m & \cdots & a_m \end{array} \right)$$

and $\mathbf{D}$ is an $n\times n$ diagonal matrix of the form, where the $r$th diagonal element contains $2f(v_r)\frac{e^{-v_r}}{(1+e^{-v_r})^2}$, where $r\in\{1,2,..,n\}$.

$$\mathbf{D}=\left( \begin{array}{cccc} 2f(v_1)\frac{e^{-v_1}}{(1+e^{-v_1})^2} & 0 & 0 & 0 \\ 0 & 2f(v_2)\frac{e^{-v_2}}{(1+e^{-v_2})^2} & 0 & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & 2f(v_n)\frac{e^{-v_n}}{(1+e^{-v_n})^2} \end{array} \right)$$

2
On

This sigmoid function is interesting because $f' = f-f^2$, which allows us to simplify the result to $$ \frac {\partial w^Tw} {\partial X} = 2 a (w^{\circ 2}-w^{\circ 3})^T $$ Let's use the notation $w' = f'(v)$, which is analogous to $w = f(v)$.

Let $a\circ b$ denote the Hadamard (element-wise) product of vectors $a, b$.

Hadamard powers are defined as $$ \eqalign { a^{\circ 2} &= a\circ a \cr a^{\circ 3} &= a\circ a\circ a \cr etc }$$ Let $A : B$ denote the Frobenius (scalar) product of matrices $A, B$.

Now it's just a matter of taking differentials $$\eqalign{ d(w^Tw) &= 2 w^T dw \cr &= 2 w^T (w' \circ dv) \cr &= 2 (w\circ w')^T dv \cr &= 2 (w\circ w')^T (dX^T a) \cr &= 2 a(w\circ w')^T : dX \cr &= 2 a(w\circ (w-w^{\circ 2}))^T : dX \cr } $$ And we see the derivative, as stated above, in the final line.