Potential error in paper asserting commutativity of symmetric matrix with Moore-Penrose projection operator

127 Views Asked by At

EDIT: The question now includes typeset equations rather than screenshots of the paper and further details. I have also rephrased the question as since I originally asked it I have done more investigation which leads me to believe there is a (potentially very small) error in the publication.

I am attempting to implement the method for finding the Moore-Penrose inverse of the sum of two symmetric matrices described in: On the pseudoinverse of a sum of symmetric matrices with applications to estimation, but my implementation is giving incorrect answers. Investigating the proof of the result, I do not agree that it holds in general.

Let $A^+$ denote the pseudoinverse of a matrix. The theorem extends a version of the Sherman-Morrison-Woodbury formula to the pseudoinverse of a sum symmetric matrices when the correction term is not contained within the column space of the original matrix, that is to say with $V$ the original matrix and $X$ a correction term, $(I-VV^+)X \neq \mathbf{0}$. The theorem is given by:

Theorem. If $V$ is an $n \times n$ real symmetric matrix and $X$ an arbitrary $n \times q$ real matrix, then: $$ (V + XX^T)^+ = V^+ - V^+X(I + X^TV^+X)^{-1}X^TV^+ + (X_\bot^+)^TX_\bot^+ $$ with $$ X_\bot = (I - VV^+)X $$ This equation does not hold on matrices which I have generated, so I investigated the proof and found much of it held, except for one relation given in Equation (15):

$$ VV^+XX^TVV^+ = XX^TVV^+ = VV^+XX^T \tag{15}$$

I see how this holds in the case that $X$ is contained in the column space of $V$, as in the theorem which the paper generalizes, but I do not see how this holds in general, and indeed does not seem to hold on any (non-column space) matrices encountered in my implementation, even though I believe I have satisfied every precondition required for the theorem. After some thought, I noticed that the following (similar looking) relations hold (and verified by computation): $$ VV^+(XX^TVV^+)^+ = (XX^TVV^+)^+ \;\;\;\;\;\; (VV^+XX^T)^+VV^+ = (VV^+XX^T)^+$$ But notably this does not give the same ability to rearrange equations that the original equation (15) does and I do not see how it can be utilized in the proof. I also see how this holds when imposing certain conditions (impossible in my use case) on the commutativity of $X$ with $V$, but not in the arbitrary case. Equation (15) is applied to yield equation (16), which asserts: $$ V + XX^T = (V + VV^+XX^TVV^+) + ((I - VV^+)XX^T(I - VV^+)) \tag{16} $$ This also does not seem to hold on the matrices I have generated.

An MWE using scipy/numpy mimicking the setup I have used but with synthetic data to show the relations failing to hold:

import numpy as np
# function calculating pseudoinverse of Hermitian matrix
from scipy.linalg import pinvh

N = 100
M = 10

def test_fast_update(V, V_plus, X):
    XM = X @ X.T
    VP = V @ V_plus
    D  = np.identity(N) - VP

    # Verify symmetric
    print ("\nV symmetric", abs(V - V.T).sum())
    print ("XX^T symmetric", abs(XM - XM.T).sum())

    # Verify pseudo-inverse
    print ("\nEqn (4)", abs((V @ V_plus @ V) - V).sum())
    print ("Eqn (5)", abs((V_plus @ V @ V_plus) - V_plus).sum())
    print ("Eqn (6)", abs((V @ V_plus).T - V @ V_plus).sum())
    print ("Eqn (7)", abs((V_plus @ V).T - V_plus @ V).sum())

    # Verify relations
    print ("Eqn (15)", abs((XM @ VP) - (VP @ XM)).sum())
    print ("Eqn (16)", abs((V + VP @ XM @ VP + D @ XM @ D) - (V + XM)).sum())

# note that here we are creating a matrix V as the product of an N x M matrix with its
# transpose, hence V will likely have rank M < N and X_2 will be unlikely to land in its
# column space
X_1 = (np.random.rand(N, M) - 0.5) * 5
V = X_1 @ X_1.T
V_plus = pinvh(V)

X_2 = (np.random.rand(N, M) - 0.5) * 5
test_fast_update(V, V_plus, X_2)

# for completeness, test with an X contained in the column space of V
X_3 = (V.T[0] + 8.9 * V.T[11] + 9.8 * V.T[80]).reshape(100, 1)
test_fast_update(V, V_plus, X_3)

Gives the output

V symmetric 0.0
XX^T symmetric 0.0

Eqn (4) 8.81369825077782e-11
Eqn (5) 4.273039132911153e-15
Eqn (6) 2.0872400470728315e-13
Eqn (7) 2.0908947909523265e-13
Eqn (15) 20776.262477009932
Eqn (16) 20645.682244203177

V symmetric 0.0
XX^T symmetric 0.0

Eqn (4) 8.81369825077782e-11
Eqn (5) 4.273039132911153e-15
Eqn (6) 2.0872400470728315e-13
Eqn (7) 2.0908947909523265e-13
Eqn (15) 8.811693197330328e-08
Eqn (16) 8.837308525244669e-08

Where you can see that the preconditions appear to be satisfied to within machine precision, and despite eqns 15/16 holding to around machine precision on the column space sample, they do not even approximately hold on the arbitrary sample.

At this point I believe that this may be an error in the publication. If anyone could provide a short proof of these relations holding in general and point out any preconditions I may have missed - or if there is an issue with my code/floating point errors dramatically affecting the results (although I appreciate this is not the primary focus of this site and my central question is about the maths) - this would be exceptionally helpful. Otherwise, I would appreciate any input on whether you believe this is an error and potential ways the proof could be amended to fix it. Thank you!