On gradient descent on $\beta \mapsto \frac1n \sum_{i=1}^{m} r_i^2(\beta)$

47 Views Asked by Bumbble Comm At 11 May 2026 - 5:23

In the gradient descent algorithm that is used in minimizing a cost function. We give the general gradient descent update rule:

$$ x_{n+1} = x_n - \lambda \nabla f(x_n) $$

We want to apply it to the MSE function given by:

$$ C(\beta) = \frac1n \sum_{i=1}^{m} r_i^2(\beta)$$

where we have $m$ functions $\mathbf {r} = (r_{1},\dots ,r_{m})$ of $n$ variables $\boldsymbol{\beta}=(\beta_{1},\ldots \beta_{n}),$ with $m\geq n,$ In the documentation of some training algorithms, the gradient descent algorithm applied to this MSE function is given by: $$\Delta\beta = - \lambda \nabla C(x_n)$$ where $\Delta\beta$ is just the update or incrementation of $\beta$, or it is given by:

$$\Delta\beta = -\lambda J^{T}_r r$$

where $J$ is the Jacobian matrix.

In the second vector formula, shouldn't it be $\Delta\beta = -\frac{2}{m}\lambda J^{T}_r r$?

Original Q&A

On gradient descent on $\beta \mapsto \frac1n \sum_{i=1}^{m} r_i^2(\beta)$

Related Questions in NONLINEAR-OPTIMIZATION

Related Questions in NUMERICAL-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Related Questions in NEURAL-NETWORKS

Related Questions in MEAN-SQUARE-ERROR

Trending Questions

Popular # Hahtags

Popular Questions