Question about the constraint in Laplacian eigenmaps

661 Views Asked by At

When calculating Laplacian Eigenmaps, the original paper mentions about the constraint $$y^TDy=1$$ as "removes an arbitrary scaling factor in the embedding". My understanding is that it prevents $y$ from going arbitrary large.

However, why cannot you just use $y^Ty=1$ to achieve the same goal? In other words, why is $D$ necessary here? Thanks!

2

There are 2 best solutions below

2
On BEST ANSWER

Here's the answer from my advisor as well as my elaboration. I think it's easier to understand and I post it here, hoping it can be of help:

The reason why Laplacian eigenmaps use $y^TDy = 1$ as constraints is that if using normal way to normalize y such as $y^Ty = 1$, the eigenvector will concentrate on the vertices with high degrees. Using $y^TDy = 1$ can mitigate this problem and making $y$ more smooth.

The “concentrate” here means that the eigenvector will have extreme large number on the position corresponding to the vertex with high degree.

Below is not a complete proof but some intuition:

Thinking of the non-trivial eigenvector of $L$. We start with the unweighted case. Consider that $v_1$ has a higher degree than other vertices. For some eigenvector $u$, we have $$(Lu)_i = deg_i u_i − \sum_{j\in adj(i)}u_j = \lambda u_i$$ This means

$$deg_iu_i − deg_i\mathbb{E}[u_j] = \lambda u_i$$ where $j$ are the adjacent indices of $i$. $$u_i−\mathbb{E}[u_j]= \frac{\lambda u_i}{deg_i}$$ Noticing the constraint that $u^T \boldsymbol 1 = 0$, consider the extreme case that the node i connects to all other nodes ( a star graph), we have $\lambda = deg_i + 1$. In this sense, for some $k$ that $d_k < d_i$, we have $$\mathbb{E}(adj(u_k)) = -\frac{deg_i + 1}{deg_k}u_k$$ This means $\mathbb{E}(adj(u_k))$ which is close to $u_i$ should be a lot far away from 0 than $u_k$.

2
On

There a several reasons why a positive definite $D$ is useful. For one thing, greater generality is attractive, isn't it?

But in most situations, $D$ is essential. The best way to think about is this. The optimization question comes from some natural "physical" problem where the question of minimum or maximum is asked subject to the constraint that the vector $v$ (representing the independent variation) is unit length - yes, to limit magnitude. At this point, there may not even be matrices in the question.

However, then comes the question of expressing the problem in the component space. In order to translate the problem to the component space, one must choose a basis. In the component space, the inner product $(u,v)$ is expressed by the matrix product $x^TDy$ where $D$ is the symmetric positive definite matrix that represents that inner product with respect to the chosen basis. The condition $(v,v)=1$ becomes $y^T Dy=1$. And, unless the chosen basis is orthonormal with respect to the inner product at hand, $D$ is not the identity matrix.

As a general rule, most things in linear algebra begin to make sense when you reconstruct the physical problem they came from.