Simplification of gradient computation form $O(n^2)$ to $O(n)$

37 Views Asked by Bumbble Comm At 05 Apr 2026 - 5:15

Let y and x be n−dimensional vectors related by $y=f(x)$ , $L$ be a differentiable loss function. According to the chain rule of calculus,$∇_xL= (∂y∂x)>∇_yL$, which takes up $O(n^2)$ computational time in general (as it requires a matrix-vector multiplication).

Show that if $f(x) = σ(x)$ or $f(x) =Softmax(x)$, the above matrix-vector multiplication canbe simplified to a $O(n)$ operation. Note that here, we used the sigmoid function for a vector input $x= (x_1,...,x_n) \in R^n →σ(x) = (σ(x1),...,σ(xn))$

Original Q&A

There are 1 best solutions below

Bumbble Comm On 30 Nov 2020 - 5:50 BEST ANSWER

Well, if $i \neq j$ then $\frac{ \partial \sigma(x_i)}{\partial x_j} = 0 $ so $\nabla f$ is sparse, only the n entries along the diagonal are non-zero, etc.

Simplification of gradient computation form $O(n^2)$ to $O(n)$

There are 1 best solutions below

Related Questions in DERIVATIVES

Related Questions in MATRIX-EQUATIONS

Related Questions in COMPUTATIONAL-COMPLEXITY

Related Questions in CHAIN-RULE

Trending Questions

Popular # Hahtags

Popular Questions