Upper bound on trace formula

106 Views Asked by At

I am trying to prove that this quantity: $f(W) = trace((C^T\circ(W^T - W'^T))(W'-W)(XX^T))$ is smooth. $\circ$ is the Hadamard product. Smoothness is satisfied if $ f(W)\le H\|W - W' \|^2$. $H$ is a constant. $\| .\|$ is any norm.

I am very close to do it, however, I am struggling in dealing with Hadamard product inside a trace.

1

There are 1 best solutions below

7
On BEST ANSWER

One approach is to use vectorization. If $\|\cdot\|$ denotes the Frobenius norm, then we have $\|W\| = \|\operatorname{vec}(W)\|$, and we can apply the following facts:

  1. $\operatorname{vec}(A \circ B) = \operatorname{diag}(\operatorname{vec}(A)) B$,
  2. $\operatorname{tr}(A^T B) = \operatorname{vec}(A)^T\operatorname{vec}(B)$,
  3. $\operatorname{vec}(AB) = (B^T \otimes I)\operatorname{vec}(A)$.

With that, we can rewrite $$ \operatorname{tr}((C^T\circ(W^T - W'^T))(W'-W)(XX^T)) =\\ \operatorname{vec}(W' - W)^T \operatorname{diag}(\operatorname{vec}(C))((XX^T) \otimes I) \operatorname{vec}(W' - W). $$ With the inequality $x^TAx \leq \sigma_{\max}(A) \|x\|^2$, we can therefore conclude that $$ \operatorname{tr}((C^T\circ(W^T - W'^T))(W'-W)(XX^T)) \leq \\ \sigma_{\max}(\operatorname{diag}(\operatorname{vec}(C))((XX^T) \otimes I)) \cdot \|W - W'\|^2. $$


Another less computation-intensive approach: for $W \neq W'$, note that $\frac{f(W)}{\|W - W'\|^2} = f(Y)$, where $Y = W' + \frac{W - W'}{\|W - W'\|}$. Note that every matrix of the from $A = \frac{W - W'}{\|W - W'\|}$ satisfies $\|A\| = 1$.

With that in mind, let $S$ denote the set $$ S = \{Y: Y = W' + A \text{ and } \|A\| = 1\}. $$ Note that $S$ is compact (closed and bounded). Thus, the continuous function $f$ must attain a maximum over $S$. In other words, there exists a $k > 0$ such that for all $W \neq W'$, we have $$ \frac{f(W)}{\|W - W'\|^2} \leq k \implies f(W) \leq k \cdot \|W - W'\|^2. $$ Trivially, this second inequality holds for $W = W'$ as well.


Regarding the explicit computation of an upper bound: note that $$ \sigma_{\max}(\operatorname{diag}(\operatorname{vec}(C))((XX^T) \otimes I)) \leq \\ \sigma_{\max}(\operatorname{diag}(\operatorname{vec}(C))) \cdot \sigma_{\max}(((XX^T) \otimes I)) = \\ \left[\max_{i,j} |C_{ij}| \right]\cdot \sigma_{\max}(X)^2. $$ Most computational software will have a reasonably efficient method for computing $\sigma_{\max}(X)$.