If $E(w,b)$ is the cross-entropy of a point in $\mathbb{R}^n$, then the gradient at a given point $x_j = (x_{1j}, \dots, x_{jn})$ is simply:
$$ \nabla E = (x_{1j}(y - \hat{y}), \dots, x_{nj}(y - \hat{y})) $$
each $x^i$ is arranged as a row (a statistical sample) in a matrix of $m$ such samples: $$X= \begin{bmatrix} x_{11} & x_{12} & x_{13} & \dots & x_{1n} \\ x_{21} & x_{22} & x_{23} & \dots & x_{2n} \\ \\ x_{m1} & x_{m2} & x_{m3} & \dots & x_{mn} \end{bmatrix} $$ So, my question is, what transformation $T$ can I compose such that the product $T \times X$ is the matrix composed by the gradient vector of each row in $X$:
$$ TX = \begin{bmatrix} x_{11}(y - \hat{y}) & \dots & x_{1n}(y - \hat{y}) \\ x_{21}(y - \hat{y}) & \dots & x_{2n}(y - \hat{y}) \\ \\ x_{31}(y - \hat{y}) & \dots & x_{3n}(y - \hat{y}) \end{bmatrix} $$
I'm asking because I need a neat, concise linear algebra formula for an algorithm I'm writing. I'd rather not use nested loops so I thought I'd take the opportunity to use linear algebra, but have failed to find a way forward.