prove that sum of quadric form less than the frobenius norm

206 Views Asked by At

I was reading this famous paper (Clustering Large Graphs via the Singular Value DecompositionClustering Large Graphs via the Singular Value Decomposition) and find difficulty in understanding a step of the proof here.

page 15

I do not know why the quadratic term is equal to the (t, t) entry of AA^T - CC^T and thus the inequality holds. Could someone give a walk through or calculation about this?

1

There are 1 best solutions below

0
On BEST ANSWER

Here's the full answer. All scalars are in $\mathbb R$ and all matrices are $\text{m x m}$.

We have orthogonal matrix
$H := \bigg[\begin{array}{c|c|c|c} \mathbf h_1 & \mathbf h_2 & \cdots & \mathbf h_m \end{array}\bigg] $
this means $\mathbf h_r^T \mathbf h_j =1$ iff $r=j$ and $0$ otherwise

(note this is a square matrix $H$ which differs slightly from that in the original post -- the OP considers a truncated $H$ that only has $k$ columns. But square, and indeed orthogonal, matrices are more pleasant to work with so I use the full square $H$.)

$Z$ is any real symmetric matrix. At the end of either proof we may select $Z:= (AA^T - CC^T)$, to get the desired result.

first proof
This is a short and simple argument. For any square matrix we have
$\sigma_\max^2 \geq \big\vert\lambda\big\vert_{max}^2$, or equivalently for any $j$ we have the point-wise bound
$\Big\Vert Z \mathbf h_j \mathbf h_j^T \Big\Vert_F^2 \geq \Big( \text{trace}\big(Z \mathbf h_j \mathbf h_j^T\big) \Big)^2$

then summing over the bound and making use of real non-negativity gives the result

main argument
$\big \Vert Z \big \Vert_F^2 $
$=\big \Vert Z H\big \Vert_F^2 $
$=\text{trace}\Big(H^T Z Z H\Big)$
$=\text{trace}\Big(Z Z HH^T\Big)$
$=\text{trace}\Big(Z Z \big(\sum_{j=1}^m \mathbf h_j \mathbf h_j^T \big)\Big)$
$=\sum_{j=1}^m \text{trace}\Big(Z Z \mathbf h_j \mathbf h_j^T \Big)$
$=\sum_{j=1}^m \text{trace}\Big(Z Z \mathbf h_j \mathbf h_j^T\mathbf h_j \mathbf h_j^T \Big)$
$=\sum_{j=1}^m \text{trace}\Big(\mathbf h_j \mathbf h_j^T Z Z \mathbf h_j \mathbf h_j^T \Big)$
$=\sum_{j=1}^m \Big\Vert Z \mathbf h_j \mathbf h_j^T \Big\Vert_F^2 $
$\geq \sum_{j=1}^m \Big( \text{trace}\big(Z \mathbf h_j \mathbf h_j^T\big) \Big)^2$
$\geq \sum_{j=1}^k\Big(\text{trace}\big( Z\mathbf h_t \mathbf h_t^T\big)\Big)^2$

second proof
this is conceptually very close to viewing the problem as a generalization of Bessel's Inequality-- but needs majorization to make the approach stick

In general we have $Z = Q\Lambda Q^T$. While $Q$ is orthogonal, it may not be equal to $H$.
with eigenvalues in the usual ordering of $\lambda_1 \geq \lambda_2 \geq ... \geq \lambda_m$

main argument
$\big \Vert Z \big \Vert_F^2 $
$=\sum_{j=1}^m \lambda_j^2 $
$=\sum_{j=1}^m \Big(\text{trace} \big( Z \mathbf q_j \mathbf q_j^T\big) \Big)^2$
$\geq \sum_{j=1}^m \Big(\text{trace} \big( Z \mathbf h_j \mathbf h_j^T\big) \Big)^2$
$\geq \sum_{j=1}^k\Big(\text{trace}\big( Z\mathbf h_t \mathbf h_t^T\big)\Big)^2$

note: If $H = Q$ then the first inequality becomes equality and the above is Bessel's Inequality. For the most part the first inequality is not met with equality and the relation is justified by Schur Convexity because

$f: x \mapsto x^2$ is (strictly) convex for any $x \in \mathbb R$
and
$\begin{bmatrix} \text{trace} \big( Z \mathbf q_1 \mathbf q_1^T\big)\\ \text{trace} \big( Z \mathbf q_2 \mathbf q_2^T\big)\\ \vdots \\ \text{trace} \big( Z \mathbf q_m \mathbf q_m^T\big) \end{bmatrix} \succeq \begin{bmatrix} \text{trace} \big( Z \mathbf h_1 \mathbf h_1^T\big)\\ \text{trace} \big( Z \mathbf h_2 \mathbf h_2^T\big)\\ \vdots \\ \text{trace} \big( Z \mathbf h_m \mathbf h_m^T\big)\\ \end{bmatrix}$

where $\succeq$ denotes majorization.

This is a standard result, but for an easy proof that majorization holds
$\sum_{j=1}^m \text{trace} \big( Z \mathbf q_j \mathbf q_j^T\big) $
$= \text{trace} \big( Z QQ^T\big) $
$= \text{trace} \big( ZHH^T \big) $
$= \sum_{j=1}^m\text{trace} \big( Z \mathbf h_j \mathbf h_j^T\big) $

and for $r\in\{1,2,...,m-1\}$ selecting positive $\delta$ large enough such that $(\delta I + Z )$ is positive definite, we have

$ \delta \cdot r + \sum_{j=1}^r \text{trace} \big( Z \mathbf q_j \mathbf q_j^T\big)$
$= \text{trace} \big( (\delta I + Z)\sum_{j=1}^r \mathbf q_j \mathbf q_j^T\big)$
$\geq \text{trace} \big( (\delta I+Z)\sum_{j=1}^r \mathbf h_j \mathbf h_j^T\big)$
$= \delta \cdot r +\sum_{j=1}^r \text{trace} \big( Z \mathbf h_j \mathbf h_j^T\big)$
by the von-Neuman trace inequality