I am learning about the laplacian eigenmap algorithm for finding a lower dimensional representation of a dataset. An important step in this algorithm is assigning weights to the adjacency graph for the data by using the heat kernel, $$w_{ij} = \Bigg\lbrace \begin{array}{ll}e^{-\frac{|x_i-x_j|^2}{t}} & \text{if $i$ is adjacent to $j$} \\ 0 & \text{else} \end{array}$$I'm trying to understand the motivations for this weight choice.
My question here is very general: Is there any value developing intuition for this problem through the lens of information theory? In learning a lower-dimensional representation of the data set, is there a notion of entropy/ignorance that one wants to maximise subject to this constraint? Some sort of Gibbs measure would be the maximally-entropic measure to use as weights on the graph, and indeed the heat kernel $w_{ij}$ has the same form as a Gibbs measure, with energy measured by the distance between points, $|x_i-x_j|$. Larger distances between points would then correspond to larger entropy/ignorance, which makes sense from the perspective that it's harder to find a good projection between points further away.
If this has any validity, I'd appreciate any answers expanding on this train of thought with more rigour or references.