For a given function $f: \mathbb{R}^n \rightarrow \mathbb{R}$, find the direction of least change

138 Views Asked by At

As written in the title, I have a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ and I want to find the direction of least change. However, I encountered an inconsistency between two different methods which I will outline below.

I start with the directional derivative as given by this answer: $(\nabla f)\cdot v$. This will give the rate of change along any direction $v = \sum_i\alpha_i\vec{e}_i$ with $\sum_i\alpha_i^2=1$. To obtain the coefficients of the vector corresponding to the least change, I want to minimize the square of that expression: $$ \min_{\{\alpha_i\}} \left[(\nabla f)\cdot v\right]^2 $$

This is similar to the following quadratic problem: $$ \min_{v} \quad v^T \cdot \underbrace{ \begin{pmatrix} \partial_1\partial_1 & \partial_1\partial_2 & \cdots & \partial_1\partial_n \\ \partial_2\partial_1 & \partial_2\partial_2 & \cdots & \partial_2\partial_n \\ \vdots & \vdots & \ddots & \vdots \\ \partial_n\partial_1 & \partial_n\partial_2 & \cdots & \partial_n\partial_n \\ \end{pmatrix} }_{\equiv A} \cdot v \qquad\mathrm{subject}\;\mathrm{to}\quad \lVert v \rVert = 1 $$ where $\partial_i$ is the $i$-th partial derivative evaluated at some reference point.

In order to solve this, I am following the paper William W. Hager, "Minimizing a Quadratic Over a Sphere" (full text available here). Relevant are Lemma 2.1 and 2.2. In summary, the result is that the direction of least change is given by the eigenvector which corresponds to the smallest eigenvalue of $A$. However, it doesn't claim that the smallest eigenvalue must be positive and, in fact, Lemma 2.1 states that only $A - \lambda_1 I$ be positive-semidefinite ($\lambda_1$ being the smallest eigenvalue and $I$ the identity matrix).

The above findings are in agreement with this answer. That answer links to a Wikipedia article about the Structure Tensor, which seems to be similar to the above matrix $A$ when the window function $w(r)$ is the Dirac delta function. This article, however, states that all eigenvalues of $A$ should be positive (see here; this is also mentioned in this paper, paragraph following eq.(16)). Also the 3D interpretation with an ellipsoid seems to require $\lambda_i\geq 0$ as they are associated with the semi-axes of the ellipsoid. This would imply that $A$ is already positive-semidefinite which seems to conflict with the statement in the above paper.

When I apply the above method, i.e. diagonalizing $A$, to a specific problem ($n=36$), then I find indeed that some of the eigenvalues are negative; $A - \lambda_1 I$ on the other hand is positive-semidefinite. So I'm wondering whether this method is correct or if there is something I'm missing? Does it have to do with the dimensionality of the problem (in my specific case $n=36$)?

The second thing I'm wondering about is that the rate of change is zero if $v$ is chosen perpendicular to $\nabla f$. For $n\geq 3$ there exist infinitely many vectors $v$ that are perpendicular to any other given vector, so this seems to contradict the above finding that the direction of least change is precisely given by one of the eigenvectors.


Numerical example

The following are the 36 elements of the gradient, separated by commas, for my specific problem (it's a complex physics simulation, hence I'm just including the gradient here):

-4.629630012686902774e+01,-1.625366508051229175e+01,-3.641531520770513453e+00,3.641531520770513453e+01,1.354472090042690979e+01,2.953193245502916398e+00,2.629008122312370688e+01,1.239008895481674699e+01,5.195843755245732609e+00,-2.098321516541545861e+01,-8.260059303211164661e+00,-1.776356839400250465e+00,1.050270981295398087e+01,1.243449787580175325e+00,-6.439293542825907934e-01,4.269917752708352054e+01,1.769695501252499525e+01,5.773159728050814010e+00,-2.420286193682841258e+00,1.931788062847772380e+00,1.509903313490212895e+00,-1.521005543736464460e+01,-7.527312106958561344e+00,-2.908784324517910136e+00,3.606004383982508443e+01,1.207922650792170316e+01,2.686739719592878828e+00,2.680078381445127889e+01,1.294520046712932526e+01,4.773959005888173124e+00,-2.051692149507289287e+01,-8.038014698286133353e+00,-1.731947918415244203e+00,9.325873406851314940e+00,1.620925615952728549e+00,-5.995204332975845318e-01

I use SymPy, specifically Matrix.diagonalize, in order to obtained the eigenvalues and eigenvectors:

import numpy as np
from sympy import Matrix

gradient = np.array([-4.629630012686902774e+01,-1.625366508051229175e+01,-3.641531520770513453e+00,3.641531520770513453e+01,1.354472090042690979e+01,2.953193245502916398e+00,2.629008122312370688e+01,1.239008895481674699e+01,5.195843755245732609e+00,-2.098321516541545861e+01,-8.260059303211164661e+00,-1.776356839400250465e+00,1.050270981295398087e+01,1.243449787580175325e+00,-6.439293542825907934e-01,4.269917752708352054e+01,1.769695501252499525e+01,5.773159728050814010e+00,-2.420286193682841258e+00,1.931788062847772380e+00,1.509903313490212895e+00,-1.521005543736464460e+01,-7.527312106958561344e+00,-2.908784324517910136e+00,3.606004383982508443e+01,1.207922650792170316e+01,2.686739719592878828e+00,2.680078381445127889e+01,1.294520046712932526e+01,4.773959005888173124e+00,-2.051692149507289287e+01,-8.038014698286133353e+00,-1.731947918415244203e+00,9.325873406851314940e+00,1.620925615952728549e+00,-5.995204332975845318e-01])
A = Matrix(np.multiply.outer(gradient, gradient))
P, D = A.diagonalize(reals_only=True, sort=True, normalize=True)

print('----- Eigenvalues -----')
print(np.diag(D), end='\n\n')

which gives the following output:

----- Eigenvalues -----
[-2.12609948489760e-13 -1.52827730730385e-13 -1.00673222408694e-13
 -7.83376902220698e-14 -5.00864285330806e-14 -4.16036346697355e-14
 -3.07234097627031e-14 -2.42707863964688e-14 -2.00785824570343e-14
 -1.15863543324252e-14 -1.00286213791179e-14 -4.01745072568617e-15
 -1.65034450852433e-15 -8.84471783248045e-16 -7.34601713232325e-16
 -3.49931683094853e-16 -1.87968744502435e-16 -3.39521826408545e-17
 2.39349396454803e-78 9.63838283004546e-17 4.83310171616047e-16
 5.70723316062587e-16 1.21622134384129e-15 1.67127258435284e-15
 2.52716079698100e-15 3.56914743743705e-15 6.33591850166460e-15
 9.44400674233662e-15 2.04608434272058e-14 2.93396570040894e-14
 4.82240672003627e-14 5.21039117277875e-14 7.82644482744230e-14
 1.11237991654078e-13 2.17480897644168e-13 10853.3563960944]

The first 35 eigenvalues have very small magnitude (compared to the 36th one), some of them are negative and others positive.

1

There are 1 best solutions below

1
On BEST ANSWER

Your matrix $A$ is an outer product of the gradient vector, and therefore a rank-1 PSD matrix. As given in this answer - https://math.stackexchange.com/a/55166/443030 - $A$ has an eigenvalue $0$ of multiplicity $n-1$, and only one eigenvalue equal to $\lVert \nabla f(x) \rVert^2$. The negative values associated with your matrix are in fact numerical inaccuracies, and can be assumed zero given the magnitude of the maximum eigenvalue.

As stated in the paper, the minimizer $v$ would be an eigenvector corresponding to the smallest eigenvalue of $A$, which in this case is $0$. Note, also, that the eigenvector corresponding to the only nonzero eigenvalue is $\nabla f(x)$, which is the direction of maximum change (in fact, any vector that is scalar multiple of the gradient points in the direction of most change). So, this is consistent to the fact that we would expect the least change to occur in the direction of all vectors in the nullspace of the gradient, which is spanned by all the eigenvectors corresponding to the eigenvalue $0$.