Can a metric be calculated by another metric?

47 Views Asked by At

Let $d: X \times X \rightarrow [0, 1]$ be a metric on $X$ which is computationally heavy to evaluate. I'm interested in the $k$ nearest neighbors of $\alpha \in Y \subsetneq X$ in respect to $d$, where $Y$ is finite but big. How can I get it more efficiently than brute force with $d$?

Lets say $n := |Y|$, then brute force is in $\mathcal{O}(n)$.

Idea 1: Clustering

  1. Get a subset $Y' \subsetneq Y$ with $m := |Y'|$ "landmarks" ($m \ll n$), e.g. with $k$-means.
  2. Cluster all points to their cluster centers.
  3. Compare $\alpha$ to the cluster centers. Compute only the distances of $\alpha$ to the cluster members

The problem of this approach is that it might be way off finding the nearest neighbors, e.g. when you think of points in $[0, 1]^2$, having cluster centers in the four corners and points all over the square. If $\alpha = (0.5, 0.5)$, then it will miss the closest $k$ ones completely.

Idea 2: Mapping to Euclidean Space

  1. Get a subset $Y' \subsetneq Y$ with $m := |Y'|$ "landmarks" ($m \ll n$).
  2. For those landmark points, all distances $d(y_1, y_2) \forall y_1, y_2 \in Y'$ are calculated.
  3. Given the distances $d_1, \dots, d_m$, I thought it might be possible to compute a point in $\mathbb{R}^p$ for any point in $Y$. So essentially find $f: [0, 1]^m \rightarrow \mathbb{R}^p$, such that $d(y_a, y_b) = d_E(f(d(y_a, y_1), \dots, d(y_a, y_m)), f(d(y_b, y_1), \dots, d(y_b, y_m))) \forall y_a, y_b \in Y$ with $d_E$ being the Euclidean metric.

If that worked, then (1) I could use a $p$d-tree and (2) the heavy computation would be done exactly $m$ times always instead of $n$ times.

Where I'm not sure about right now:

  1. How do I find $Y' \subsetneq Y$?
  2. How do I find $p \in \mathbb{N}$? Maybe $p = |Y'| - 1$?
  3. How do I find $f$? (I guess this?)
  4. Is this always possible?