How is effective rank (a ratio of nuclear norm and operator norm) defined in terms of eigenvalues?

57 Views Asked by At

I am reading this paper on implicit regularisation in gradient descent and I am having difficulty with the provided definition of effective rank. In the paper it is given as

$r(W) = \frac{||W||_*}{||W||}$

Where $||W||_*$ is the nuclear norm, $||W||$ is the operator norm and $W \in \mathbb{R}^{n \times n}$ is symmetric. Assuming $W$ has eigenvalues $\lambda_i$, $i \in \{1,...,n\}$, is the below formula for the effective rank correct?

$\sum_{\ell=1}^{n}\frac{\left|\lambda_{\ell}\right|}{\left|\lambda_{1}\right|}$

Additionally, is anyone able to provide an intuition as to why one may use the effective rank rather than the true rank? I see there is already a question on here regarding effective rank, but the definition given appears slightly different as to the one I am working with. (Unless this indeed the entropy of the notional distribution obtained by normalising the singular values as answered on that question).

1

There are 1 best solutions below

0
On BEST ANSWER

If $W$ is square and symmetric, then $W^TW = W^2$. The eigenvalues of $W^2$ are precisely the squares of the eigenvalues of $W$, since $Wx=λx⟹W^2x = λ^2 x$. Thus, the singular are $σ_k = \sqrt{λ_k(W^TW)}= \sqrt{λ_k^2} = |\lambda_k|$, assuming $λ_k$ are ordered by magnitude.

So in this special case, $r(W) = \frac{||W||_*}{||W||} = ∑_{k=1}^n \frac{σ_k}{σ_1} = ∑_{k=1}^n \frac{|λ_k|}{|λ_1|}$.

Remark: In the literature, effective rank has been defined differently than in this publication. The effective rank: A measure of effective dimensionality (IEEE 2007) define it as $e^{H(p(A))}$, where $H$ is the Shannon entropy and $p(A)$ is a categorical distribution with probabilities according to the singular values.