Summing rows and columns for a rank-1 matrix approximation

304 Views Asked by At

Let $M$ be a matrix, and form a column vector $R$ by summing the rows of $M$, and a row vector $C$ by summing the columns of $M$. We think of the matrix $RC / \Sigma M$ as a rank-1 approximation to $M$ (here $\Sigma M$ denotes the sum of all elements of $M$. You can assume it's nonzero).

If $M$ is rank-1, $RC / \Sigma M = M$.

Is this approximation optimal among rank-1 matrices with respect to some matrix norm? What exactly is the matrix $RC / \Sigma M$?

Statistical motivation:

If $M$ represents a joint probability distribution $P(X, Y)$, then $R$ represents the marginal distribution $P(X)$, $C$ represents the marginal distribution $P(Y)$, and $RC$ represents a new probability distribution on the same space where now the coordinates are independent.

1

There are 1 best solutions below

0
On BEST ANSWER

It's not a norm, but this is the optimal rank-1 approximation using the Kullback-Leibler divergence.

Let $M = P(X, Y)$ represent a joint probability distribution. Let $RC$ be the product of marginals. Then it's one definition of mutual information that

$$D_{KL}(P(X, Y), RC) = I(X;Y)$$

And in fact this is optimal for rank-1 matrices: for any rank-1 matrix $r \otimes c$ where both $r$ and $c$ are probability distributions,

\begin{align*} D_{KL}(P(X, Y), r \otimes c) &= \displaystyle\sum_{x,y} P(X = x, Y = y) \log \left(P(X = x, Y = y) \over r_x c_y\right)\\ &= \displaystyle\sum_{x,y} P(X = x, Y = y) \log P(X = x, Y = y) - \displaystyle\sum_{x,y} P(X = x, Y = y) \log r_x - \displaystyle\sum_{x,y} P(X = x, Y = y) \log c_y\\ &= H(X, Y) - \displaystyle\sum_{x} P(X = x) \log r_x - \displaystyle\sum_{y} P(Y = y) \log c_y \end{align*}

With the Gibb's Inequality (the inequality proving KL divergence is nonnegative) this is at least

$$H(X, Y) - H(X) - H(Y) = I(X;Y)$$

If you don't restrict yourself to probability distributions, you can send $D_{KL}$ to $-\infty$ by taking huge elements of $r$ and $c$.

If $M$ has all entries nonnegative (at least one positive), $M / \Sigma M$ will be a probability distribution, and the above will apply.