Is there an 'inner product wrt a matrix' version of gradient descent?

87 Views Asked by Bumbble Comm At 02 Apr 2026 - 9:50

Gradient descent generally starts with a first order Taylor approximation motivation. If we have a function $f:\mathbb{R}^p\rightarrow\mathbb{R}^p$, and we start at a point $x\in \mathbb{R}^p$, then we can look at the first order Taylor approximation \begin{align} f(x+\Delta x)\approx f(x)+\langle\nabla f(x),\Delta x \rangle_{l^2} \end{align} We want to have the update $\Delta x$ to point in the same direction as $-\nabla f(x)$ in order to minimize $\langle\nabla f(x),\Delta x \rangle_{l^2}$. However could we use a different inner product? For instance let's say we have an SPD matrix $A\in \mathbb{R}^{p\times p}$ and we use the inner product $\langle x,y\rangle_A=x^T A y$. Then we could Taylor approximate \begin{align} f(x+\Delta x)\approx f(x)+\langle \nabla f(x),\Delta x\rangle_A \end{align} We would then have gradient descent updates \begin{align} x_{n+1}=x_n-\eta A\nabla f(x) \end{align} where $\eta$ is the learning rate. Is this type of gradient descent an actual procedure? If so, what is it called? If not, what is 'wrong' with it? I'm asking this because this paper 'seems' to be doing an infinite dimensional/functional version of this procedure.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 04 Apr 2022 - 3:26

While I don't know if this 'matrix gradient descent' actually has a formal name, we can note that it has nice properties in the gradient flow case. Note that if \begin{align*} \frac{\partial x(t)}{\partial t}&=-A\nabla f(x(t)) \end{align*} then \begin{align*} \frac{\partial f(x(t))}{\partial t}&=\nabla f(x(t))^T\frac{\partial x(t)}{\partial t}\\ &=-\nabla f(x(t))^T A \nabla f(x(t)) \end{align*} where the first line applies the chain rule. So if $A$ is SPD then $\frac{\partial f(x(t))}{\partial t}=0$ iff $\nabla f(x(t))=0$ i.e. $x(t)$ is a critical point. Thus 'matrix' gradient flow is guaranteed to converge to a critical point if the matrix is SPD.

Is there an 'inner product wrt a matrix' version of gradient descent?

There are 1 best solutions below

Related Questions in MATRICES

Related Questions in INNER-PRODUCTS

Related Questions in NUMERICAL-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions