If vectors are maxima of $\frac{a^T a}{a^T \Sigma^{-1} a}$, then they are eigenvectors of $\Sigma$.

Question

If vectors are maxima of $\frac{a^T a}{a^T \Sigma^{-1} a}$, then they are eigenvectors of $\Sigma$.

105 Views Asked by Bumbble Comm At 29 Apr 2026 - 9:50

Let $\Sigma \in M_q$ be symmetric and positive definite matrix. If vectors $a_1,\dots, a_q \in \mathbb R^q$ satisfy:

$(i)$ $$i \not = j \implies a_i^T a_j = 0$$ $(ii)$ $$\frac{a_1^T a_1}{a_1^T \Sigma^{-1} a_1} =\sup_{a \not = 0} \frac{a^T a}{a^T \Sigma^{-1} a} $$ $(iii)$ $$i \gt 1 \implies \frac{a_i^T a_i}{a_i^T \Sigma^{-1} a_i} = \sup\left\{ \frac{a^T a}{a^T \Sigma^{-1} a} : a \not = 0, a^T a_j = 0, j=1,\dots, i-1 \right\}$$ then $a_1,\dots, a_q$ are eigenvectors of $\Sigma$.

I am stuck trying to prove this claim. The text gives the hint to use induction, but I am not sure how to proceed. The claim is part of the proof that $a_1^T Y, \dots, a_q^T Y$ are main components of $Y$. Given this result, rest of the proof is simple.

Edit

Attempt: Let $A, \Lambda$ be the square matrices such that $$\Sigma A = A\Lambda, \quad A^T \Sigma^{-1} A = I$$ where $\Lambda = \text{diag}(\phi_1,\dots, \phi_q)$ has eigenvalues of $\Sigma$ in descending order. Let: $$F(a) = \frac{a^T a}{a^T \Sigma^{-1} a}.$$ Then $$R(c) := F(Ac) = \frac{c^T A^T A c}{c^T A^T \Sigma^{-1} A c} = \frac{c^T A^T\Sigma^{-1} \Sigma A c}{c^T c} = \frac{c^T A^T\Sigma^{-1} A \Lambda c}{c^T c} = \frac{c^T \Lambda c}{c^T c}.$$

It is clear that $a$ is supremum of $F$ is and only if $A^{-1}a$ is supremum of $R$. Let $c_i = A^{-1} a_i, i = 1,\dots q$. Then: $$a_i^T a_j = c_i^T A^T A c_j = c_i^T A^T \Sigma^{-1}\Sigma A c_j =c_i^T A^T \Sigma^{-1} A \Lambda c_j = c_i^T \Lambda c_j.$$

Therefore the $c_1,\dots, c_q$ satisfy:

$(i')$ $$i \not = j \implies c_i^T \Lambda c_j = 0$$ $(ii')$ $$\frac{c_1^T \Lambda c_1}{c_1^T c_1} =\sup_{c \not = 0} \frac{c^T \Lambda c}{c^T c} $$ $(iii')$ $$i \gt 1 \implies \frac{c_i^T \Lambda c_i}{c_i^T c_i} = \sup\left\{ \frac{c_i^T \Lambda c_i}{c_i^T c_i} : c \not = 0, c^T \Lambda c_j = 0, j=1,\dots, i-1 \right\}$$

which seem somewhat simpler since $\Lambda$ is diagonal, but i am still unsure how to proceed.

Attempt 2. Thanks to Mason's answer i feel i am closer to the solution: $$\frac{\partial F}{\partial a}(a) = \frac{2a \cdot (a^T \Sigma^{-1} a)-2\Sigma^{-1} a (a^T a)}{(a^T \Sigma^{-1}a)^2} = 0 \implies \Sigma^{-1}a = \frac{a^T\Sigma^{-1}a}{a^Ta} a$$ which implies every critial value of $F$ must be eigenvalue, and that $a_1$ is the eigenvector.

Now that this is done, my idea is to look at the function: $$F:\{a_1\}^\perp \to \{a_1\}^{\perp}$$ and doing the same procedure shows that critical values of $F$ must be eigenvectors of $\Sigma^{-1}$. However I am not sure i can conclude that the minimum and maximum of $F$ on the set $\{a_1\}^\perp$ are critical points of $F$ since $\{a_1\}^\perp$ is not open.

Original Q&A

There are 3 best solutions below

**user6223** · Answer 1 · 2022-06-26 13:17:46

Because I don't like $\Sigma$ as a variable, nor inverses if I can avoid them, I'll write $A=\Sigma^{-1}$; this is (also) a positive definite symmetric matrix. Of course the (strictly positive) eigenvalues of $A$ are the inverses of those of $\Sigma$.

This becomes easier if you eliminate coordinates, or rather, adapt them to the problem. So we view $\Bbb R^q$ as Euclidean vector space with the standard inner product that I shall denote $(\cdot\mid\cdot )$, and $A$ is is matrix of a positive definite symmetric bilinear form (indeed, another inner product) $(x,y)\mapsto (x\mid Ay)=(Ax\mid y)$. So the expression that the supremum is taken over (among all nonzero vectors in some subspace) can be written as $$\frac{(a\mid a)}{(a\mid Aa)}.$$

The spectral theorem says that there is an orthogonal change of basis (namely to an orthonormal eigenbasis of $A$) that changes $A$ into a diagonal matrix. Othogonality of the change of basis means the inner product is still the standard inner product after the base change (formally: the change of basis matrix $\Omega$ satisfies $(\Omega x\mid \Omega y)=(x\mid y)$ for all vectors $x,y$). So we have effectively reduced the problem to one where $A$ is a diagonal matrix (with strictly positive diagonal entries). While we are at the game of base change, we might as well throw in a permutation of the basis (always an orthogonal change) that makes the diagonal weakly increasing.

If $q=0$ there is nothing to prove, so I will now assume $q>0$; this may seem frivolous, but it is the terminating condition for the induction on the dimension that will implicitly be set up below.

Now it is not hard to see that if $D$ is a diagonal matrix with weakly increasing strictly positive diagonal entries, the value of $\frac{(a\mid a)}{(a\mid Da)}$ is maximized among nonzero vectors for those vectors whose nonzero entries (at least one) occur only in the initial position(s) where $D$ has its lowest eigenvalue, and the value attained is the inverse of that eigenvalue. In particular, any vector$~v$ where the maximum is attained is an eigenvector of $D$, for its minimal eigenvalue (and applying the change of basis matrix $\Omega$ to it, we get an eigenvector of $A$ for its minimal eigenvalue, and of $\Sigma$ for its maximal eigenvalue, which in fact coincides with that maximal value). Now the problem restricts the other vectors to be found to be orthogonal to $v$, so we restrict to the subspace $v^\perp$, with restrictions of our standard inner product and of the (other) positive definite symmetric bilinear form, and upon reflection the remainder is the exact same problem but in lower dimension, so induction on the dimension will do the trick.

It is admittedly not very beautiful to start with a matrix problem and then upon doing induction decide that it was better to do the equivalent abstractly formulated problem instead. Those who prefer to stick with matrices throughout can apply, after the choice of the first maximizing vector, yet another orthogonal change of basis, this time only involving the basis of the eigenspace for the smallest eigenvalue of $D$, and which makes the normalisation $\frac1{(v\mid v)}v$ of $v$ into the first basis vector; then the restriction operation basically means chopping off the first row and column from $D$ and continuing. There is clearly no obstruction, and one clearly gets a sequence of (mutually orthogonal) eigenvectors for $D$ in all cases, ordered by weak increase of their associated eigenvalue.

**Bumbble Comm** · Answer 2 · 2022-06-26 14:38:27

Consider that $$\eqalign{ \def\l{\lambda}\def\li{\l^{-1}} \def\S{\Sigma}\def\Si{\S^{-1}} \def\LR#1{\left(#1\right)} \def\BR#1{\Big(#1\Big)} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \max_x\fracLR{x^Tx}{x^TA x} \;=\; \min_x\fracLR{x^TA x}{x^Tx} \;=\; \min_{\|x\|=1}\BR{x^TA x} \;=\; \l_{min} \\ }$$ where $\l_{min}$ is the minimal eigenvalue of $A$.

Recall that if $\l$ is an eigenvalue of $A,\,$ then $\li$ is an eigenvalue $A^{-1}$ for the same eigenvector $x$. This means that the minimal eigenvalue of $A$ is the maximal eigenvalue of $A^{-1}$ and vice versa.

So in your problem, selecting the eigenvector associated with the minimal EV of $\Si$ (or maximal EV of $\S$) will maximize the expression.

**Bumbble Comm** · Answer 3 · 2022-06-27 01:12:16

Let $A = \Sigma^{-1}$. Note that $A$ is symmetric (and positive definite). Define $Q : \mathbb{R}^q \to \mathbb{R}$ by $$Q(x) = \frac{(Ax, x)}{\|x\|^2} = \frac{x^TAx}{\|x\|^2}.$$ Since $A^T = A$, we have for all $x \in \mathbb{R}^q$ that $$\nabla Q(x) = \frac{\|x\|^2(2Ax) - (Ax, x)(2x)}{\|x\|^4} = \frac{2\|x\|^2Ax - 2(Ax, x)x}{\|x\|^4}.$$ So $\nabla Q(v) = 0$ implies $\|v\|^2Av = (Av, v)v$, i.e. $Av = \frac{(Av, v)}{\|v\|^2}v$. Conversely, if $v$ is an eigenvector of $A$, then $\nabla Q(v) = 0$. Now if $Q$ has a maximum or minimum at $v$, then $\nabla Q(v) = 0$, so $v$ is an eigenvector of $A$. This observation, combined with the observation that if $U$ is a subspace and $A : U \to U$ then $A : U^{\perp} \to U^{\perp}$ allows you to use induction to prove your problem.

If vectors are maxima of $\frac{a^T a}{a^T \Sigma^{-1} a}$, then they are eigenvectors of $\Sigma$.

There are 3 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in STATISTICS

Related Questions in MATRIX-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions