Gradient descent rule

769 Views Asked by Bumbble Comm At 31 Mar 2026 - 1:44

In machine learning a very common technique to use as a training algorithm (in NN) is the gradient descent rule. I understand that it is an iterative process of increasing each of the weights based on the partial derivative. Why could we not simply take partial derivative of all weights, set up set of linear equations, and solve them? Is it the computational cost?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 06 May 2012 - 10:18 BEST ANSWER

If you do that you'll get a non-linear rather than a linear equation.

This is a common strategy for solving some optimization problems, but then that leads to finding a root of a nonlinear system of equations. This can be done using Newton's method (and generalizations), but this will generally involve dense matrix computations.

The dense matrix computations are the issue. Just setting up and solving the Newton's equations is costly (making a matrix will be O(n^2) without including the cost of computing the entries, and solving a matrix equation is O(n^3)).

Another issue in the NN context is online algorithms vs. batch algorithms. In that context it's much more common to use sequential gradient descent (SGD) than the standard gradient descent. (The

Gradient descent rule

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in NUMERICAL-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions