In deep learning, such an operation is common:
$$A = B\circ (C>0.2)$$
where $A,B,C\in \mathbb{R}^{n\times m}$, $\circ$ denotes Hadamard Multiplication and $C>0.2$ is the matrix where each element is itself when it is larger than $0.2$ and is masked as $0$ otherwise.
I want to know the partial derivative for $A$ with respect to $C$, formally $$\frac{\partial A}{\partial C}$$
$\def\v{{\rm vec}}\def\d{{\rm diag}}\def\D{{\rm Diag}}\def\o{{\tt1}}\def\p{{\partial}}\def\grad#1#2{\frac{\p #1}{\p #2}}\def\hess#1#2#3{\frac{\p^2 #1}{\p #2\,\p #3^T}}\def\bb{\mathbb}$It's simpler use the
vec()operator and deal with a vector equation in $\,{\bb R}^{mn\times 1}$ $$a = b\circ (c>\lambda)$$ Make the following definitions $$\eqalign{ z &= c-\lambda\o \qquad&(\lambda\,{\rm is\,an\,arbitrary\,scalar}) \\ {\cal H}(z_k) &= \begin{cases}1\quad{\rm if}\quad z_k>0\\0\quad{\rm otherwise} \end{cases} \qquad&({\rm Heaviside\,step\,function}) \\ h &= {\cal H}(z) \qquad&({\rm apply\,the\,function\,elementwise}) \\ }$$ Write the problem in terms of the above, then calculate its differential and gradient. $$\eqalign{ a &= b\circ h\circ c \\ da &= b\circ h\circ dc \;=\; \D(b\circ h)\; dc \\ \grad{a}{c} &= \D(b\circ h) \\\\ }$$Note that the quantity $G=\left(\grad{a}{c}\right)$ calculated above is a ${\bb R}^{mn\times mn}$ matrix whereas the requested quantity $\Gamma=\left(\grad{A}{C}\right)\in{\bb R}^{m\times n\times m\times n}$ which is a fourth-order tensor. The individual elements are identical $\big({\rm e.g.}\;\Gamma_{1111}=G_{11}\big);\,$ the tensor has simply been reshaped into a matrix.
The elements of the matrix can be reshaped into a tensor, if that is the desired form. Personally I use Julia and find the matrix form more convenient to work with, since I can use regular built-in matrix*vector product as opposed to writing explicit for-loops.
Consider the matrix calculation
versus the tensor calculation
Yes, there are tensor packages available, but it's still awkward compared to working with vectors and matrices.