Making Vectors as Different As Possible

59 Views Asked by At

I am given a function $f:\mathbb{R}^M\times\mathbb{R}^N\rightarrow\mathbb{R}^K$. Let us denote the inputs to $f$ as $(a,b)$, where $a\in\mathbb{R}^M$ and $b\in\mathbb{R}^N$ The set of possible values of $b$ is finite (that is, there is a finite number of vectors $b$ that I care about).

Consider a function $g_a:\mathbb{R}^N\rightarrow\mathbb{R}^K$ where $g_a(b)=f(a,b)$ for a specific $a$. My goal is to find the value of $a$ so that the vectors that are output from $g_a$ are as different as possible from each other in the sense that the angle between any two vectors is as large as possible.

Consider the matrix $\Gamma(a)$ where each column of $\Gamma(a)$ is an output of $g$. I think that the singular values of $\Gamma(a)$ are relevant. For example, I could choose the values of $a$ that maximize the volume of the ellipsoid that is the image of the unit ball under $\Gamma$. Or, I could choose the values of $a$ that maximize the smallest singular value of $\Gamma$.

What is the best metric to use in this situation? I would guess that there already exists some theory that's relevant to my problem. Could someone provide me with a specific reference? My own searches have not been fruitful.

1

There are 1 best solutions below

2
On

I'm sure you thought of this, but an obvious idea is to literally use the sum of angles between the vectors: $$ E(a) = \sum_{b\in B} \sum_{\beta\in B} \arccos\left( \frac{g_a(b)^Tg_a(\beta)}{||g_a(b)||_2\;||g_a(\beta)||_2} \right) $$ where $B\subset \mathbb{R}^N$. Then use e.g. gradient descent on $E$ to optimize it.

As for your $\Gamma$ idea, one idea is to use the covariance matrix of the data; i.e. consider the set of output vectors as being drawn from a multivariate probability distribution and try to maximize the (estimated) variance of this distribution. This will try to "spread out" the points as much as possible, in a certain sense.

However, there are many ways to do this. Read this question, this one, and this one, for instance. So for some variance measure $V_j$, we will want $a = \arg\max_{a\in \mathbb{R}^M} V_j$.

Let $M_a=\Gamma(a)^T - \mu_{a}$, where $\mu_a$ is the matrix of column means of $\Gamma(a)^T$. Then let: $$ C_a = \frac{1}{|B|-1}M_a^TM_a $$ be the centered covariance matrix. Note that I'm assuming you're working over all $B$ (your finite set), but you obviously don't have to :).

Then, one measure is the total component-wise variance is given by: $$ V_1 = \text{tr}(C_a) = \sum_i \lambda_i(C_a) $$ where $\lambda_i(A)$ is the $i$th eigenvalue of $A$. Another measure is: $$ V_2 = \sqrt{\det(C_a)} = \sqrt{\prod_i^{\,} \lambda_i(C_a)} $$ which is the product of the component variances.

However, you might want to have a little more isometry; i.e. reduce the covariances, while still increasing the variances. (See here for some visualization).

A simple method is just to use: $$ V_3 = \text{tr}({C_a}) - \frac{1}{K-1}\sum_i (\lambda_i(C_a) - \mu_\lambda)^2 $$ where the extra term is the estimate of $\mathbb{V}[\lambda]$ with $\mu_\lambda = \frac{1}{K}\sum_j \lambda_j(C_a)$.

For a more physics flavour, use the notion of a deviator tensor (e.g. see here), which measures the "non-isometric" component of a tensor. Let: $$ \Psi = C_a - \frac{\text{tr}({C_a})}{K}I $$ Then consider: $$ V_4 = \text{tr}({C_a}) - ||\Psi|| $$ where $||\Psi||$ could be the spectral 2-norm or the Frobenius norm for instance.