Why are data points which are “close” should have a “similar” label?

147 Views Asked by At

In paper Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering, the basis idea is that data points which are “close” (i.e., $w_{ij}$ is large) should have a “similar” label (i.e., $f_i ≈ f_j$ ). Why?

If data points which are “far” (i.e., $w_{ij}$ is small) should have a “dissimilar” label. However, in order to minimize objective function:

$\sum_{i,j}(y_i-y_j)^{2}w_{ij}$, where $w_{ij}$ is denoted by Heat kernel.

What I am confused about is that when the objective function is minimized, $w_{ij}$ does not change, while $(y_i-y_j)^{2}$ should become small. But this is inconsistent with the idea of LE algorithm. Why?

I know this is a assumption. However, why does the author assume that? The paper is too obscure to understand (Lack of relevant mathematical knowledge).

Thanks...

1

There are 1 best solutions below

1
On BEST ANSWER

There simply is no requirement that vertices far apart will get different labels. For example, here is an embedding of the cube graph using the eigenvectors of two of the smallest nonzero eigenvalues:

enter image description here

Vertices that are connected by edges end up relatively close. However, the middle vertex in this diagram is actually two vertices: opposite vertices of the cube that have ended up on top of each other. Even though they are far apart in the graph, they end up as close as possible in the 2D embedding.

The price we pay for finding a low-dimensional embedding of a graph that (in the case of the cube) has a natural higher-dimensional embedding or (in other cases) may not come from any geometric object at all, is that sometimes vertices that are far apart will end up very close to each other. If you think about reasonable ways to draw a cube on a piece of paper, there is just no way to avoid this. (Though we usually try to not put two vertices literally on top of each other.)

In practice, most of the time, most vertices that are far apart in the graph will end up relatively far apart in the embedding even though the objective function $$\sum_{i,j} w_{ij}(y_i - y_j)^2$$ seems like it wants to minimize every distance $|y_i - y_j|$.

This is because $\mathbf y_0$ - the first eigenvector, which minimizes this without constraints - is the all-$1$ vector, up to scaling. Past that, the eigenvectors $\mathbf y_1, \mathbf y_2, \dots$ which we actually use to embed must all be orthogonal to $\mathbf y_0$ (and to each other). If the entries of $\mathbf y_1$ sum to $0$, that means not all of them can be close together; some are positive, and some are negative to balance them out.

Although to minimize the objective function, it helps when any difference $|y_i - y_j|$ gets smaller, it helps much more when this happens to a difference with a large weight $w_{ij}$. So those "get prioritized" in a sense: the vector $\mathbf y_1$ will probably do a good job of making most differences $|y_i-y_j|$ with large $w_{ij}$ be small. But not all differences can be small, and so many pairs with small $w_{ij}$ will end up far apart, just because we didn't care about them as much.

But - again - it's not that we want such pairs to be far apart, it's that we don't care about them. Sometimes, when vertices are far apart and get similar values, it's because there is a hidden symmetry or near-symmetry in the data, which makes those vertices look very similar.

Taking more eigenvectors generally helps deal with this, because they are all orthogonal to each other. If $\mathbf y_1$ gave several far-apart vertices similar values, a cheap way for $\mathbf y_2$ to become orthogonal to $\mathbf y_1$ is to give those vertices very different values (some positive, some negative). Of course, taking more eigenvectors also makes the dimension of the result go up...