The directional derivative represents the slope of a function along a direction. The minimization of the directional derivative is equal to the anti-gradient, and the proof of this is clear to me. But why is the minimization of the directional derivative at a point x, equivalent to finding the direction where the function at point x decreases faster? I can't understand why the anti-gradient gives direction and direction in which the function decreases faster. What is the relationship between the minimization of the directional derivative and the monotony of a function to be minimized?
2026-03-31 20:49:29.1774990169
Demonstration Gradient Descent
148 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in DERIVATIVES
- Derivative of $ \sqrt x + sinx $
- Second directional derivative of a scaler in polar coordinate
- A problem on mathematical analysis.
- Why the derivative of $T(\gamma(s))$ is $T$ if this composition is not a linear transformation?
- Does there exist any relationship between non-constant $N$-Exhaustible function and differentiability?
- Holding intermediate variables constant in partial derivative chain rule
- How would I simplify this fraction easily?
- Why is the derivative of a vector in polar form the cross product?
- Proving smoothness for a sequence of functions.
- Gradient and Hessian of quadratic form
Related Questions in OPTIMIZATION
- Optimization - If the sum of objective functions are similar, will sum of argmax's be similar
- optimization with strict inequality of variables
- Gradient of Cost Function To Find Matrix Factorization
- Calculation of distance of a point from a curve
- Find all local maxima and minima of $x^2+y^2$ subject to the constraint $x^2+2y=6$. Does $x^2+y^2$ have a global max/min on the same constraint?
- What does it mean to dualize a constraint in the context of Lagrangian relaxation?
- Modified conjugate gradient method to minimise quadratic functional restricted to positive solutions
- Building the model for a Linear Programming Problem
- Maximize the function
- Transform LMI problem into different SDP form
Related Questions in MACHINE-LEARNING
- KL divergence between two multivariate Bernoulli distribution
- Can someone explain the calculus within this gradient descent function?
- Gaussian Processes Regression with multiple input frequencies
- Kernel functions for vectors in discrete spaces
- Estimate $P(A_1|A_2 \cup A_3 \cup A_4...)$, given $P(A_i|A_j)$
- Relationship between Training Neural Networks and Calculus of Variations
- How does maximum a posteriori estimation (MAP) differs from maximum likelihood estimation (MLE)
- To find the new weights of an error function by minimizing it
- How to calculate Vapnik-Chervonenkis dimension?
- maximize a posteriori
Related Questions in GRADIENT-DESCENT
- Gradient of Cost Function To Find Matrix Factorization
- Can someone explain the calculus within this gradient descent function?
- Established results on the convergence rate of iterates for Accelerated Gradient Descent?
- Sensitivity (gradient) of function solved using RK4
- Concerning the sequence of gradients in Nesterov's Accelerated Descent
- Gradient descent proof: justify $\left(\dfrac{\kappa - 1}{\kappa + 1}\right)^2 \leq \exp(-\dfrac{4t}{\kappa+1})$
- If the gradient of the logistic loss is never zero, does that mean the minimum can never be achieved?
- How does one show that the likelihood solution for logistic regression has a magnitude of infinity for separable data (Bishop exercise 4.14)?
- How to determinate that a constrained inequality system is not empty?
- How to show that the gradient descent for unconstrained optimization can be represented as the argmin of a quadratic?
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
The answer to your question is rooted in the relation between derivatives and linear approximations by tangents. When you take a differentiable real-valued function $f: \mathbb{R}^n \rightarrow \mathbb{R}$, a linear approximation of $f$ at a point $a = (a_1, a_2, \dots, a_n) \in \mathbb{R}^n$ could be defined by:
$$g(x) = f(a) + \sum\limits_{i=1}^{n} f_{x_i}(a)(x_i-a_i)$$ where $x = (x_1, x_2, \dots, x_n) \in \mathbb{R}^n$.
It is easy to verify that $g(a) = f(a)$ and $\nabla g = \nabla f$.
Now, take any unit vector $\hat{u}$, if we want to minimize $D_{\hat{u}}f(a)$, we have to minimize $\nabla f(a) \cdot \hat{u} = \nabla g(a) \cdot \hat{u}$. So we could look at the function $g$ instead of $f$.
For the cases of the first and second dimensions where the graph of $g$ is a line and a plane respectively, it is easy to imagine that the direction which minimizes the directional derivative is in the opposite direction of the gradient.
In higher dimensions, it becomes more difficult to see the relationship. Let's look for a unit vector $\hat{u} = (u_1, u_2, \dots, u_n)$ such that $g(a + \hat{u})$ is minimized.
$$g(a + \hat{u}) = f(a) + \sum\limits_{i=1}^{n} f_{x_i}(a)((a_i + u_i)-a_i) = f(a) + \sum\limits_{i=1}^{n} f_{x_i}(a) u_i = f(a) + \nabla g(a) \cdot \hat{u}$$
Since $f(a)$ is a constant, then we need to minimize the term $\nabla g(a) \cdot \hat{u}$. It could be easily verified that the minimum of the term occurs when $\hat{u} = - \frac{\nabla g(a)}{\| \nabla g(a)\|}$. In other words, when $\hat{u}$ is in the opposite direction of the gradient.
However, this is not true for minimizing $f(a + \hat{u})$. At the first glance, one might think that to minimize $f(a + \hat{u})$, the unit vector $\hat{u}$ must be in the opposite direction of the gradient. But the length of the vector $\hat{u}$ is 1 which is large enough for the function to have many local minima in the unit disc around $a$ so that no single direction is optimal.
This confusion in the meaning of "steepest descent" or "fast decreasing" makes one think incorrectly of gradient descent. The direction of the "steepest descent" should only be understood "infinitesimally" and not with a distance away from a point. It should be noted that the opposite direction of the gradient may not be the best direction to find an absolute minimum or even a good local minimum after going through several iterations.