Calculate $(M^T.\left(M.M^T\right)^{-1}.M{})_{i,j}$ for large sparse binary matrix $M$

124 Views Asked by At

CONTEXT: I am a musician working on a algorithmic composition system. I need to find efficient algorithms to solve some matrix equations. I am not a mathematician or a student and direct answer is appreciated.

I have a very large sparse binary matrix M, where $M\in \{0,1\}^{n\times m}$. I am trying to calculate $N_{i,j}=(M^T.\left(M.M^T\right)^{-1}.M{})_{i,j}$.

EDIT: This is the matrix product of $M^T$ and the Moore-Penrose inverse of $M^T$.

$N_{i,j}$ is an $m\times m$ matrix and is too large to calculate completely -- I just need to find one element at a time.

I suspect there is a simple form for the elements of N. Can anyone show what $N_{i,j}$ looks like?

Thanks again.

Bonus points

I'll take what I can get, but it would be easier if the answer was in terms of these functions. Each row in M can be thought of a set of integers, e.g., the ith set is $\{x : M_{i,x} = 1\}$. I know many of the properties of these sets and can calculate numbers like these easily.

j belongs to set i: $M_{i,j}$

Size of a set (sum of a row): $J_{m}.M_{i}$

Number of sets j belongs to (sum of column): $J_{n}.M^T_{j}$

Size of the intersection of two sets (dot product of rows): $M_{i_{1}}.M_{i_{2}}$

etc.

2

There are 2 best solutions below

0
On

Let $A=M^T$ and $A^+$ be its Moore-Penrose inverse; we assume that $rank(A)=n\leq m$; then, according to @J. M. is not a mathematician , we consider $Z=AA^+$ and we want to calculate $Z[p,q]$.

$Z$ is the orthogonal projection onto $im(A)$.

Step 1. We seek a basis $\{f_1,\cdots,f_n\}$ of the row space of $M$ and we orthogonalize it.

Step 2. $Z[p,q]=\sum_{k=1}^n<e_p,f_k><f_k,e_q>$, where $(e_i)$ is the canonical basis of $\mathbb{R}^m$.

That follows is a toy maple program with $n=15,m=100,p=14,q=35$. The rational $Z[p,q]$ is in $s$.

restart;
with(LinearAlgebra);

roll := rand(0 .. 1);
M := Matrix(15, 100, proc (i, j) options operator, arrow; roll() end proc); Rank(M);
                               15

Practically, you don't calculate the first line of the following:

Z := Transpose(M) . (1/(M . Transpose(M))) . M:


R := RowSpace(M);
G := GramSchmidt(R, normalized);

p := 14; q := 35;
s := 0; for i to 15 do s := s+G[i][p]*G[i][q] end do; 

s-Z[p, q];  0


 s;  -14424929318699075225/1190467204823982047177
0
On

As Rodrigo de Azevedo notes you are dealing with a projection matrix.

Your use of the notation $(MM^T)^{-1}$ implies that $M \in \mathbb R^{n \times m}$ has full rank, hence $n \leq m$.

Compute the $QR$ decomposition of $M^T \in \mathbb R^{m \times n}$, i.e. $M^T = QR$. Here $Q \in \mathbb R^{m \times n}$ and $R \in \mathbb R^{n \times n}$. $Q$ is orthonormal and $R$ is upper triangular with strictly positive diagonal entries. The cost is $O(mn^2)$ arithmetic operations and $O(mn+n^2)$ words of storage. Then \begin{align} N = M^T(MM^T)^{-1} M &= QR (R^T Q^T Q R)^{-1}) R^T Q^T \\&= QR (R^T R)^{-1} R^T Q^T \\&= Q R R^{-1} R^{-T} R^T Q^T \\&= QQ^T \end{align} This approach is attractive if $n$ is small relative to $m$. You obviously have $O(m)$ words of storage available, but I can not tell if you have $O(mn)$ words available.

In any case, you will be able to compute $N_{ij}$ as $$N_{ij} = (Q^T e_i)^T (Q^T e_j),$$ i.e. by taking the inner product between the $i$th and the $j$th row of $Q$.

There are sparse codes available for computing a QR factorization. As far as I know there is no way to exploit that your entries are zeros and ones.