Let $f$ be ReLU function, $X \in \mathbb{R}^{n \times d}$ and $\beta \in \mathbb{R}^d$. What is the derivative of $ \nabla_{\beta} \| f(X \beta)\|_{2}^{2}$ ?
I think, by chain rule, $\nabla_\beta \| f(X\beta) \|^{2}_{2} = 2 \nabla f(X\beta) f(X\beta) = 2 X^{\top} f'(X\beta) f(X\beta)$. But, the matrix size does'nt match.
$ \def\a{\alpha}\def\b{\beta}\def\t{\theta} \def\o{{\tt1}}\def\p{\partial} \def\L{\left}\def\R{\right}\def\LR#1{\L(#1\R)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\grad#1#2{\frac{\p #1}{\p #2}} $Define the vector $$y=X\b \quad\implies\quad dy=X\,d\b$$ and the scalar ReLu and Step functions $$\eqalign{ f(z) &= \begin{cases} z \quad{\rm if}\;z\ge 0 \\ 0 \quad{\rm if}\;z<0 \\ \end{cases} \qquad\qquad g(z) = \frac{df}{dz} &= \begin{cases} 1 \quad{\rm if}\;z\ge 0 \\ 0 \quad{\rm if}\;z<0 \\ \end{cases} \\ }$$ Apply them elementwise to the vector $y$ to generate the vectors $$f=f(y),\qquad\qquad g=g(y) \qquad$$ Then the function in this question becomes $$\eqalign{ \phi &= f:f \\ d\phi &= 2f:df \\ &= 2f:\LR{g\odot dy} \\ &= 2\LR{f\odot g}:X\,d\b \\ &= 2X^T\LR{f\odot g}:d\b \\ \grad{\phi}{\b} &= 2X^T\LR{f\odot g} \\ }$$ where $(\odot)$ denotes the elementwise/Hadamard product and $(:)$ the trace/Frobenius product. These products have the following definitions
$$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:\LR{B\odot C} &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij}C_{ij} \;=\; \LR{A\odot B}:C \\ }$$ These matrix products can also be applied to vectors by treating them as rectangular matrices with one column.