In the gradient descent algorithm that is used in minimizing a cost function. We give the general gradient descent update rule:
$$ x_{n+1} = x_n - \lambda \nabla f(x_n) $$
We want to apply it to the MSE function given by:
$$ C(\beta) = \frac1n \sum_{i=1}^{m} r_i^2(\beta)$$
where we have $m$ functions $\mathbf {r} = (r_{1},\dots ,r_{m})$ of $n$ variables $\boldsymbol{\beta}=(\beta_{1},\ldots \beta_{n}),$ with $m\geq n,$ In the documentation of some training algorithms, the gradient descent algorithm applied to this MSE function is given by: $$\Delta\beta = - \lambda \nabla C(x_n)$$ where $\Delta\beta$ is just the update or incrementation of $\beta$, or it is given by:
$$\Delta\beta = -\lambda J^{T}_r r$$
where $J$ is the Jacobian matrix.
In the second vector formula, shouldn't it be $\Delta\beta = -\frac{2}{m}\lambda J^{T}_r r$?