How to solve a matrix equation for a scalar?

467 Views Asked by At

Given matrices $Q, P \succeq 0$, a vector $q$, a real number $\gamma$. How can one solve the equation

$ q^T (Q+\lambda P)^{-T}P(Q+\lambda P)^{-1} q = \gamma$

for the scalar $\lambda$ in an efficient way.

As far as I know we need iterative algorithm for that. A quick look in the literature suggests that the so called safeguarded newton method can be used. I did not understand how it exactly works, but it seems that we need to find a Cholesky factor of $(Q+\lambda P)$ every iteration. This is quite expensive in terms of computations if the size of the matrices is large.


EDIT: One can also consider the simpler case when $P = I$ the identity and $Q$ is a diagonal matrix with the ith diagonal entry $Q_{ii}$, then the problem boils down to solving the following equation for $\lambda$

$\frac{q_1^2}{(\lambda+Q_{11})^2} + \frac{q_2^2}{(\lambda+Q_{22})^2} + \dots + \frac{q_n^2} {(\lambda+Q_{nn})^2} = \gamma$

2

There are 2 best solutions below

2
On

Algebraic fundamentals

If the matrix $A$ has characteristic polynomial $c(x)=c_0+c_1x+...c_nx^n$ then by Cayley-Hamilton, $c(A)=c_0I+c_1A+...c_nA^n=0$.

Divide $c(x)$ by $(1+λx)$, $λ^nc(x)=(1+λx)q(λ,x)+r(λ)$ where $q$ is a polynomial in $x$ and $λ$. Its coefficients in $x$ can be computed via the Horner-Ruffini scheme with arithmetic in the polynomial ring in $λ$. Again inserting $A$ for $x$ gives $0=(I+λA)q(λ,A)+r(λ)I$ and thus $$ (I+λA)^{-1}=-\frac{q(λ,A)}{r(λ)} $$ Now assuming that $Q$ is invertible, set $A=Q^{-1}P$ to get the equation as a polynomial in $λ$ with fixed and reasonably easily computable coefficients.


For a related problem and inspiration, look up the Leverrier-Faddeev method and especially its derivation/proof.


Algorithmic idea

Thus read your original expression as $$ q^T(I+λPQ^{-1})^{−T}Q^{-T}PQ^{-1}(I+λPQ^{-1})^{−1}q=γ $$ resulting in $A=PQ^{-1}$ or using a Cholesky decomposition $Q=CC^T$ the more symmetric expression $$ q^TC^{-T}(I+λC^{-1}PC^{-T})^{−T}C^{-1}PC^{-T}(I+λPC^{-1}C^{-T})^{−1}C^{-1}q=γ $$ with $A=PC^{-1}C^{-T}$.

In both cases, the expression to evaluate now has the form $$ v^T(I+λA)^{-1}B(I+λA)^{-1}v=γ. $$ with $B=A$ in the second symmetric case.

Now compute $\chi_A(x)=\det(xI-A)=x^n+c_{n-1}x^{n-1}+...+c_1x+c_0$ using your favorite algorithm, via Hessenberg form, Leverrier-Faddeev, Berkovitz,...

Perform polynomial division $$ λ^n\chi_A(x):(1+λx) $$ with the polynomial ring in $λ$ as ground ring (for instance using multipoint evaluation--interpolation) to obtain $r(λ)$ and $q(λ,x)$, the latter as coefficient sequence for the powers in $x$ where the coefficients are polynomials in $λ$.

Using the vector sequence $v,Av,...,A^nv$ evaluate $q(λ,A)v$. Which now is a vector valued polynomial in $λ$.

And finally establish the polynomial equation $$ (q(λ,A)v)^T·B·(q(λ,A)v)=r(λ)^2·γ $$ which amounts to a convolution of vector sequences.


Performance

One may now compare this procedure to the alternative of computing the eigenvector decomposition $A=UΣU^T$ in the second symmetric case resulting in the complete diagonalization $$ (U^{-1}v)^T(I+λΣ)^{-1}Σ(I+λΣ)^{-1}(U^{-1}v)=\sum_{k=1}^n\tilde{v}_k^2·\frac{σ_k}{(1+λσ_k)^2}=γ $$ as proposed by Michael Grant. Essentially, this compares the most modern eigenvector decomposition, something like the implicit multi-shifted QR method, with the algorithm to determine the characteristic polynomial. The winner might depend on the required precision.


More radical interpolation

With the knowledge that the original expression is rational in $λ$ and that the denominator is the square of a determinant, one can avoid almost all matrix decompositions except those to compute the matrix inverse resp. the solution of $(Q+λP)w=q$ for scalar values of $λ$.

Compute for a sequence $λ_1,λ_2,…,λ_{2n}$ of $2n$ (or more) distinct sample points (real or complex numbers) the values $$ r_k=\det(Q+λ_k P),\quad s_k=r_k^2·q^T (Q+λ_k P)^{-T}P(Q+λ_k P)^{-1} q $$ Using one LU decomposition (or QR or...) $Q+λ_k P=LU$, both determinant and solution of the linear system $(Q+λ_kP)w=q$ can be obtained from the factors $L$ and $U$. (mantra: the $L$ in $A=LU$ and $A=LDL^T$ is the same for symmetric $A$.)

Using interpolation (if the $λ_k=q^{k-1}$, $q=e^{i\,2\pi/N}$, $k=1,...,N$, are equally spaced on the complex unit circle, this can be done via DFT or FFT) one obtains polynomial $R(λ)$ and $S(λ)$ in which the original equation reads as $$ S(λ)=γ·R(λ)^2 $$

0
On

I am not convinced you can escape the curse of $O(n^3)$ here. You are likely stuck with it. However, you can collect that work in pre-calculation, and get the Newton iterations down to $O(n)$.

I am assuming that $Q+\lambda P\succ 0$ for at least one $\lambda$, a reasonable assumption given that you're assuming $Q+\lambda P$ is at least sometimes invertible. Then $P$ and $Q$ can be simultaneously diagonalized; that is, there exists an invertible matrix $U$ such that $$U^TQU = \Sigma_1, \quad U^TPU = \Sigma_1.$$ Determining the value of $U$ and its inverse will cost the equivalent of a few Choleskys. $$ q^T (U^T(\Sigma_1+\lambda \Sigma_2)U)^{-1}U^T\Sigma_1U(U^T(\Sigma_1+\lambda \Sigma_2)U)^{-1} q = \gamma$$ This simplifies to $$ q^T U^{-1} (\Sigma_1+\lambda \Sigma_2)^{-1}\Sigma_1(\Sigma_1+\lambda \Sigma_2)^{-1} U^{-T} q = \gamma$$ and then to $$\sum_{i=1}^n \bar{q}_i^2 \sigma_{1i} / ( \sigma_{1i} + \sigma_{2i} \lambda )^2=\gamma$$ where $\bar{q}=U^{-T}q$. This doesn't get you away from some initial $O(n^3)$ calculations. But once you have done that, you can get the key iteration down to $O(n)$.

I will be honest, I am not sure this is necessarily better using a Cholesky in each iteration. The cost of the precalculation step is $O(n^3)$, but the $n^3$ coefficient is several times larger than the $1/3$ for a Cholesky.