Linear algebra for computing gradient of cross-entropy loss

69 Views Asked by Bumbble Comm At 31 Mar 2026 - 7:12

If $E(w,b)$ is the cross-entropy of a point in $\mathbb{R}^n$, then the gradient at a given point $x_j = (x_{1j}, \dots, x_{jn})$ is simply:

$$ \nabla E = (x_{1j}(y - \hat{y}), \dots, x_{nj}(y - \hat{y})) $$

each $x^i$ is arranged as a row (a statistical sample) in a matrix of $m$ such samples: $$X= \begin{bmatrix} x_{11} & x_{12} & x_{13} & \dots & x_{1n} \\ x_{21} & x_{22} & x_{23} & \dots & x_{2n} \\ \\ x_{m1} & x_{m2} & x_{m3} & \dots & x_{mn} \end{bmatrix} $$ So, my question is, what transformation $T$ can I compose such that the product $T \times X$ is the matrix composed by the gradient vector of each row in $X$:

$$ TX = \begin{bmatrix} x_{11}(y - \hat{y}) & \dots & x_{1n}(y - \hat{y}) \\ x_{21}(y - \hat{y}) & \dots & x_{2n}(y - \hat{y}) \\ \\ x_{31}(y - \hat{y}) & \dots & x_{3n}(y - \hat{y}) \end{bmatrix} $$

I'm asking because I need a neat, concise linear algebra formula for an algorithm I'm writing. I'd rather not use nested loops so I thought I'd take the opportunity to use linear algebra, but have failed to find a way forward.

Original Q&A

Linear algebra for computing gradient of cross-entropy loss

Related Questions in LINEAR-ALGEBRA

Related Questions in NUMERICAL-METHODS

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions