Distance between unequal-dimension vectors (and data)?

2.8k Views Asked by At

It is easy to find simple distance measures for equal-dimension vectors, such as Euclidean Distance or Correlation. What about unequal-dimension vectors, such as, for instance, $(a,b,c)$ and $(d,e)$? Are there any known approaches in math for that?

For example, one approach would be to consider smaller-dimension vectors to be projected: $(a,b,c)$ and $(d,e,0)$. But then I assumed to which plane and potentially I miss $(d,0,e)$. So there is ambiguity.

Are there any other practically used, especially non-ambiguous measures?

Of course generalization to unequal-dimension, even ragged, data arrays is interesting, so please elaborate if you can.

But some simple computation with $(a,b,c)$ and ($d,e)$ would be very instructive.

2

There are 2 best solutions below

0
On

I will just point out the obvious mathematical approach which is probably not really useful though. Assuming your vectors are in $\mathbb{R}^n$ and $\mathbb{R}^m$ for $m<n$ you can always take some injection from $\mathbb{R}^n$ to $\mathbb{R}^m$. The obvious one just pads by $0$'s.

Depending on what kind of analysis you are performing you might have a more natural subspace to which you want to map the smaller vector space. For example you might consider looking at some sort of partial projection. To visualize this consider the fact where you have two dimensional and one dimensional vectors. To make it even more trivial take $(1,0)$, $(0,1)$ as your two dimensional sample and say $(2)$ as your one dimensional vector.

Depending on the type of data you could consider either mapping the one dimensional version so that the direction is the average of the directions of the two dimensional ones (in this case $(1,1)$) and the length could either be conserved (thus actually you would get $(\sqrt{2},\sqrt{2})$) or you could preserve the first coordinate(s) getting $(2,2)$.

For more dimensions the distortion becomes smaller and for larger datasets you might consider looking for example at averages for the undefined dimensions (possibly weighting them depending on similarity of defined dimensions). But all together this will hugely depend on what each dimension represents. If you expect that there is a corellation between the dimensions weighting makes more sense then if you don't etc.

One other approach is to instead of expanding the dimension of the shorter vector instead project the longer one. In particular just for example throw away the extra dimensions. Again a lot of this depends on what those extra dimensions represent and how they tie in to the ones you always have.

0
On

If your point is perpendicular to the projection plane you could just use Pythagoras' theorem. Recall that any point can be defined as a vector. Let $\textbf{q}$ be the higher-dimensional point, $\text{proj}(\textbf{q})$ be the projection of $\textbf{q}$ onto the lower-dimensional plane, and $d$ be the distance between these two points (i.e., vectors). Then $d$ is: $$ d = \sqrt{ \left\lVert q \right\rVert - \left\lVert \text{proj}(\textbf{q}) \right\rVert } $$ Where $\left\lVert \textbf{x} \right\rVert$ is the norm of $\textbf{x}$, i.e., the square root of the sum of the squares.

$\text{proj}(\textbf{q})$ could be found, for example, using the projection matrix, $\it{P}$, whose rows are any number of eigenvectors you choose from a covariance matrix, $\Sigma$. To find $P$ this way I refer you to instructions on how to perform Principal Component Analysis (PCA).