When calculating Laplacian Eigenmaps, the original paper mentions about the constraint $$y^TDy=1$$ as "removes an arbitrary scaling factor in the embedding". My understanding is that it prevents $y$ from going arbitrary large.
However, why cannot you just use $y^Ty=1$ to achieve the same goal? In other words, why is $D$ necessary here? Thanks!
Here's the answer from my advisor as well as my elaboration. I think it's easier to understand and I post it here, hoping it can be of help:
The reason why Laplacian eigenmaps use $y^TDy = 1$ as constraints is that if using normal way to normalize y such as $y^Ty = 1$, the eigenvector will concentrate on the vertices with high degrees. Using $y^TDy = 1$ can mitigate this problem and making $y$ more smooth.
The “concentrate” here means that the eigenvector will have extreme large number on the position corresponding to the vertex with high degree.
Below is not a complete proof but some intuition:
Thinking of the non-trivial eigenvector of $L$. We start with the unweighted case. Consider that $v_1$ has a higher degree than other vertices. For some eigenvector $u$, we have $$(Lu)_i = deg_i u_i − \sum_{j\in adj(i)}u_j = \lambda u_i$$ This means
$$deg_iu_i − deg_i\mathbb{E}[u_j] = \lambda u_i$$ where $j$ are the adjacent indices of $i$. $$u_i−\mathbb{E}[u_j]= \frac{\lambda u_i}{deg_i}$$ Noticing the constraint that $u^T \boldsymbol 1 = 0$, consider the extreme case that the node i connects to all other nodes ( a star graph), we have $\lambda = deg_i + 1$. In this sense, for some $k$ that $d_k < d_i$, we have $$\mathbb{E}(adj(u_k)) = -\frac{deg_i + 1}{deg_k}u_k$$ This means $\mathbb{E}(adj(u_k))$ which is close to $u_i$ should be a lot far away from 0 than $u_k$.