Gradient descent generally starts with a first order Taylor approximation motivation. If we have a function $f:\mathbb{R}^p\rightarrow\mathbb{R}^p$, and we start at a point $x\in \mathbb{R}^p$, then we can look at the first order Taylor approximation \begin{align} f(x+\Delta x)\approx f(x)+\langle\nabla f(x),\Delta x \rangle_{l^2} \end{align} We want to have the update $\Delta x$ to point in the same direction as $-\nabla f(x)$ in order to minimize $\langle\nabla f(x),\Delta x \rangle_{l^2}$. However could we use a different inner product? For instance let's say we have an SPD matrix $A\in \mathbb{R}^{p\times p}$ and we use the inner product $\langle x,y\rangle_A=x^T A y$. Then we could Taylor approximate \begin{align} f(x+\Delta x)\approx f(x)+\langle \nabla f(x),\Delta x\rangle_A \end{align} We would then have gradient descent updates \begin{align} x_{n+1}=x_n-\eta A\nabla f(x) \end{align} where $\eta$ is the learning rate. Is this type of gradient descent an actual procedure? If so, what is it called? If not, what is 'wrong' with it? I'm asking this because this paper 'seems' to be doing an infinite dimensional/functional version of this procedure.
2026-04-02 21:50:45.1775166645
Is there an 'inner product wrt a matrix' version of gradient descent?
87 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in MATRICES
- How to prove the following equality with matrix norm?
- I don't understand this $\left(\left[T\right]^B_C\right)^{-1}=\left[T^{-1}\right]^C_B$
- Powers of a simple matrix and Catalan numbers
- Gradient of Cost Function To Find Matrix Factorization
- Particular commutator matrix is strictly lower triangular, or at least annihilates last base vector
- Inverse of a triangular-by-block $3 \times 3$ matrix
- Form square matrix out of a non square matrix to calculate determinant
- Extending a linear action to monomials of higher degree
- Eiegenspectrum on subtracting a diagonal matrix
- For a $G$ a finite subgroup of $\mathbb{GL}_2(\mathbb{R})$ of rank $3$, show that $f^2 = \textrm{Id}$ for all $f \in G$
Related Questions in INNER-PRODUCTS
- Inner Product Same for all Inputs
- How does one define an inner product on the space $V=\mathbb{Q}_p^n$?
- Inner Product Uniqueness
- Is the natural norm on the exterior algebra submultiplicative?
- Norm_1 and dot product
- Is Hilbert space a Normed Space or a Inner Product Space? Or it have to be both at the same time?
- Orthonormal set and linear independence
- Inner product space and orthogonal complement
- Which Matrix is an Inner Product
- Proof Verification: $\left\|v-\frac{v}{\|v\|}\right\|= \min\{\|v-u\|:u\in S\}$
Related Questions in NUMERICAL-OPTIMIZATION
- Modified conjugate gradient method to minimise quadratic functional restricted to positive solutions
- Bouncing ball optimization
- Minimization of a convex quadratic form
- What is the purpose of an oracle in optimization?
- What do you call iteratively optimizing w.r.t. various groups of variables?
- ProxASAGA: compute and use the support of $\Delta f$
- Can every semidefinite program be solved in polynomial time?
- In semidefinite programming we don't have a full dimensional convex set to use ellipsoid method
- How to generate a large PSD matrix $A \in \mathbb{R}^{n \times n}$, where $\mathcal{O}(n) \sim 10^3$
- Gram matrices in the Rayleigh-Ritz algorithm
Related Questions in GRADIENT-DESCENT
- Gradient of Cost Function To Find Matrix Factorization
- Can someone explain the calculus within this gradient descent function?
- Established results on the convergence rate of iterates for Accelerated Gradient Descent?
- Sensitivity (gradient) of function solved using RK4
- Concerning the sequence of gradients in Nesterov's Accelerated Descent
- Gradient descent proof: justify $\left(\dfrac{\kappa - 1}{\kappa + 1}\right)^2 \leq \exp(-\dfrac{4t}{\kappa+1})$
- If the gradient of the logistic loss is never zero, does that mean the minimum can never be achieved?
- How does one show that the likelihood solution for logistic regression has a magnitude of infinity for separable data (Bishop exercise 4.14)?
- How to determinate that a constrained inequality system is not empty?
- How to show that the gradient descent for unconstrained optimization can be represented as the argmin of a quadratic?
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
While I don't know if this 'matrix gradient descent' actually has a formal name, we can note that it has nice properties in the gradient flow case. Note that if \begin{align*} \frac{\partial x(t)}{\partial t}&=-A\nabla f(x(t)) \end{align*} then \begin{align*} \frac{\partial f(x(t))}{\partial t}&=\nabla f(x(t))^T\frac{\partial x(t)}{\partial t}\\ &=-\nabla f(x(t))^T A \nabla f(x(t)) \end{align*} where the first line applies the chain rule. So if $A$ is SPD then $\frac{\partial f(x(t))}{\partial t}=0$ iff $\nabla f(x(t))=0$ i.e. $x(t)$ is a critical point. Thus 'matrix' gradient flow is guaranteed to converge to a critical point if the matrix is SPD.