Relationship between $L^2$-distance and cosine similarity

125 Views Asked by At

Given a vector space $X$, the cosine-similarity can be defined as: $$c(x,y)= \frac{\langle x, y\rangle}{\| x \| \| y \|} $$ and distance is: $$d(x,y) = \| x - y\| $$

First, I expect to estimate some "similarity" between linear operators, i.e. $A, B: X\to X$. The most commonly used one is 2-norm derived directly from vector norm: $$ \| A \| = \sup_{\|x\|=1} \|Ax\| $$ and $$D(A, B) = \| A - B\| < \delta \implies d(Ax, Bx) = \| Ax - Bx\| < \delta \| x \| $$

Similarly, I can also define the cosine similarity between two operators by:

$$ C(A,B) = \inf_x c(Ax, Bx) $$

When $C(A,B) = 1$, it is easy to assert that $\exists k>0,$ s.t. $A = k B$.

Now, I expect to use $C$ to estimate $D$ i.e.

$$ \min_k D(A, kB) = \textrm{some-function}(C(A, B)) $$

However, I just cannot get a good result.

Thanks for your comment.

2

There are 2 best solutions below

2
On

You say that "When $C(A,B)=1$, it is easy to assert that $\exists K>0$ st $A=kB$."

Fix $a>1$ and let $$A=\begin{pmatrix} a & 0 \\ 0 & 1\end{pmatrix}$$ and $$B=\begin{pmatrix} 0 & 0 \\ 0 & 1\end{pmatrix}.$$ Let $$x=\begin{pmatrix} 0 \\ 1 \end{pmatrix}.$$ Then $$Ax=Bx=x,$$ and $$c(Ax,Bx)=1.$$ Therefore $$C(A,B)=\sup_{\|y\|=1}c(Ay,By)=1.$$ But clearly neither matrix is a constant multiple of the other.

The condition that $\sup_{\|y\|=1}c(Ay,By)=1$ just means that for at least one non-zero $y$, $Ay,By$ point in the same direction. If $A=kB$ for some $k>0$, then $Ay,By$ will point in the same direction for every $y$ (where we agree that the zero vector points in the same direction as itself).

But with $A,B$ above for any $k$, $\|A-kB\|\geqslant a$, which can be as big as we want.

So a large $C(A,B)$, even equal to $1$, as you have defined it does not yield any estimate on $\min_k \|A-kB\|$.

For the other direction, if $\|A-kB\|$ is small for $k>0$, (as a fraction of $\|A\|$), you can pick $x$ with $\|x\|-1$ such that $\|Ax\|=\|A\|$ and show that $c(Ax,Bx)$ must be close to $1$. More precisely, there exists a function $\theta:(0,1)\to (0,1)$ such that $\lim_{\delta\to 0^+}\theta(\delta)=0$ such that if $\|A-C\|\leqslant \delta \|A\|$, then there exists $x$ such that $c(Ax,Cx)\geqslant 1-\theta(\delta)$.

Cosines have to do with Hilbert norms. The operator norm $\sup_{\|x\|=1}\|Ax\|$ is not a Hilbert norm. There is a Hilbert norm on the spaces of matrices. The Frobenius norm (or Hilbert-Schimdt norm). It is defined as $\text{trace}(A^TB)$, when $A,B$ are matrices of the same dimensions. If $a_{i,j}$ is the $i,j$ entry of $A$ and $b_{i,j}$ is the $i,j$ entry of $B$, then $$\text{trace}(A^TB)=\sum_{i,j}a_{i,j}b_{i,j},$$ which looks exactly like the dot product (except it's over a rectangle instead of a row/column). The Frobenius norm is the one that will have relationships between distances and cosines, exactly the way the usual norm does on $\mathbb{R}^n$.

Note that the operator norm is kind of a hybrid norm between the $\ell_2$ and $\ell_\infty$ norms. For example, if $A$ is a diagonal matrix with entries $a_1,\ldots, a_n$, then the operator norm of $A$ is $\max_i |a_i|$. This is very non-Hilbertian.

0
On

Here is a very naive trial. Under the condition $C(A, B) > 1-\delta$, if we consider the manifold $\Omega=\{x:\|Ax\| = 1\}$, the distance:

$$ \| Ax - kBx \|^2 = 1 - 2k\langle Ax, Bx \rangle +k^2 \| Bx \|^2 < 1-2k(1-\delta)\|Bx\| +k^2 \|Bx\|^2 $$

We can not estimate $D(A,B)$ directly, but we can estimate something like mean square error:

$$ E(k) =\int_{\Omega} \| Ax-kBx\|^2 = \int_\Omega 1\mathrm d x-2(1-\delta)k\left(\int_\Omega \|Bx\|\right) + k^2 \left(\int_\Omega \|Bx\|^2\right) $$

We can set $k$ to minimize $E(k)$, the minimum is:

$$E^*=\int_\Omega 1\mathrm d x - (1-\delta)^2\frac{ \left(\int_\Omega \|Bx\| \mathrm dx\right)^2 }{\int_\Omega \|Bx\|^2 \mathrm dx }$$

From Holder's Inequality:

$$\left(\int_\Omega \| Bx\|\mathrm dx \right) ^2 \le \int_\Omega 1\mathrm d x \int_\Omega \|Bx\|^2 \mathrm dx $$

Then estimate the lower bound of $E^*$ as

$$ E^* \ge (1-(1-\delta)^2) \int_\Omega 1\mathrm d x $$

However, I need an estimation of upper bound of $E^*$, which is also I cannot figure out.

In another word, what I cannot figure out is the relationship formulated like the following,

$$ \frac{ \left(\int_\Omega \|Bx\| \mathrm dx\right)^2 }{\int_\Omega \|Bx\|^2 \mathrm dx } \ge \mathrm{something}(\delta) $$

Feel free to add more constraints on the space, linear operators, and even the spectrum of the operators, I just need to get some basic result even it is non-trival.