What is the derivative of the ReLu of a Matrix with respect to a matrix

1.7k Views Asked by At

I want to compute $\frac{\partial r(ZZ^tY)}{\partial Z}$ where the ReLu function is a nonlinear operator $r(x)=max(0,x)$ and $Z \in\mathbb{R}^{n\times m}$ is a matrix.

I am wondering also if the derivative of the transpose is the transpose of the derivative of my expression, i.e., $\frac{\partial r(ZZ^tY)^t}{\partial Z}=\big(\frac{\partial r(ZZ^tY)}{\partial Z}\big)^t.$

1

There are 1 best solutions below

0
On BEST ANSWER

I hadn't previously heard of the ReLu function, but based on the description, its derivative is the Heaviside step function, $$ \frac{dr(x)}{dx} = H(x) $$ Since your argument is a matrix of unspecified shape (square? rectangular?) and since there is no partial ordering for matrices in general, I assume that you are applying the function in an element-wise fashion.

When applied element-wise to a matrix argument ($X=ZZ^TY$), the differential of the function can be expressed using the Hadamard product ($\circ$) as $$\eqalign{ dr &= H \circ dX \cr &= H \circ (dZ\,Z^TY + Z\,dZ^TY) \cr }$$ Since $\big(\frac{\partial r}{\partial Z}\big)$ is a matrix-by-matrix derivative (aka $4^{th}$-order tensor) your question about taking its transpose is not clear. Instead $$ \frac{\partial r(ZZ^TY)^T}{\partial Z} = \frac{\partial r(Y^TZZ^T)}{\partial Z} $$ assuming, once again, that you are applying the function element-wise.