I am reading about newton and quasi-newton optimization algorithms and there is a mention of their invariance (or lack thereof) under transformation and how it helps them tackle ill posed problems. But I cant seem to understand why it would be so and under what kinds of issues with Hessian. Any insight/intuition about how invariance is relevant or any references would be very helpful
2026-03-26 12:43:32.1774529012
invariance under transformation of optimization methods
1k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in OPTIMIZATION
- Optimization - If the sum of objective functions are similar, will sum of argmax's be similar
- optimization with strict inequality of variables
- Gradient of Cost Function To Find Matrix Factorization
- Calculation of distance of a point from a curve
- Find all local maxima and minima of $x^2+y^2$ subject to the constraint $x^2+2y=6$. Does $x^2+y^2$ have a global max/min on the same constraint?
- What does it mean to dualize a constraint in the context of Lagrangian relaxation?
- Modified conjugate gradient method to minimise quadratic functional restricted to positive solutions
- Building the model for a Linear Programming Problem
- Maximize the function
- Transform LMI problem into different SDP form
Related Questions in INVARIANCE
- A new type of curvature multivector for surfaces?
- Is a conformal transformation also a general coordinate transformation?
- What does invariant to affine transformations mean?
- Matrix permutation-similarity invariants
- system delay: $x(2t-t_o) \,\,or \,\, x(2t-2t_o)$?
- Diffeomorphism invariance, Lie derivative
- Writing a short but rigorous proof
- Poles and Zeros of a Linear Transform
- Two Player Strategy Game
- Rigorous proof to show that the $15$-Puzzle problem is unsolvable
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
First of all, here's a proof that Newton's method is invariant with respect to a change of basis:
Newton's method minimizes a smooth function $f:\mathbb R^n \to \mathbb R$ using the iteration $$ x^{n+1} = x^n - Hf(x^n)^{-1} \nabla f(x^n), $$ where $Hf(x^n)$ is the Hessian of $f$ at $x^n$.
Let's make a change of variables $y = Ax$, where $A$ is an invertible matrix. Minimizing $f$ with respect to $x$ is equivalent to minimizing $g(y) = f(A^{-1}y)$ with respect to $y$.
From the chain rule, $\nabla g(y) = A^{-T} \nabla f(A^{-1} y)$ and $Hg(y) = A^{-T} Hf(A^{-1}y) A^{-1}$. The Newton's method iteration to minimize $g$ is \begin{align} y^{n+1} &= y^n - A Hf(A^{-1}y^n)^{-1} A^T A^{-T} \nabla f(A^{-1}y^n) \\ &= y^n - A Hf(A^{-1}y^n)^{-1} \nabla f(A^{-1}y^n). \end{align} Multiplying both sides by $A^{-1}$ and defining $x^n = A^{-1}y^n$, we find that $$ x^{n+1} = x^n - Hf(x^n)^{-1} \nabla f(x^n). $$ This is the Newton's method iteration for minimizing $f$ with respect to $x$.
For optimization algorithms such as gradient descent that do not have this invariance, it is often important or helpful to select a matrix $A$ so that the new function $g(y) = f(A^{-1}y)$ is better conditioned than $f$ in some sense. The level curves of $g$ should be like circles instead of like highly elongated ellipses --- that will make gradient descent converge faster. But for Newton's method we don't have to worry about doing this.