I have $N$ vectors that have $D$ dimensionality, that is $ X \in \mathbb{R}^{N \times D}$, the $i$-th vector in $X$ is denoted as $x_i$.
These vectors are normalized by L2 Normalization.
Then, these vectors have the same length as $1$ and the differences between two vectors are represented by an angle between them.
Then, I calculate weighted average (by softmax function) of each vector in $X$ as follows:
$$ \hat{x_i} = \sum_{j=1}^{N}(\frac{e^{x_i \cdot x_j^{T}}}{\sum_{k=1}^{N}e^{x_i \cdot x_k^{T}}}) \cdot x_j = \sum_{j=1}^{N}(\frac{e^{cos\theta_{ij}}}{\sum_{k=1}^{N}e^{cos\theta_{ik}}}) \cdot x_j $$
Obviously, the weighted parts are sum to 1. Then, I multiply the vector
$$ d_i = x_i \cdot \hat{x_i}^{T} = x_i \cdot \sum_{j=1}^{N}(\frac{e^{cos\theta_{ij}}}{\sum_{k=1}^{N}e^{cos\theta_{ik}}}) \cdot x_j^{T} = \sum_{j=1}^{N}(\frac{e^{cos\theta_{ij}}}{\sum_{k=1}^{N}e^{cos\theta_{ik}}}) \cdot x_i \cdot x_j^{T} =\sum_{j=1}^{N}(\frac{e^{cos\theta_{ij}}}{\sum_{k=1}^{N}e^{cos\theta_{ik}}}) \cdot cos\theta_{ij} $$
I empirically found that $d_i$ can be used to represent its density in $X$. In other words, if $d_i$ has high values, then the vector $x_i$ is located in dense regions (there are many vectors near $x_i$), and vice versa for low value of $d_i$.
Intuitively, I suspect the reason for this (Maybe the bounded function (i.e., $cos\theta$) in Softmax?), but I can't find mathematical proof for this.
I wonder how it can work or which condition makes $d_i$ can be used to a density estimator.
\begin{align*} d_i &= x_i \cdot \hat{x_i}^{T} \\ &= x_i \cdot \sum_{j=1}^{N}(\frac{e^{cos\theta_{ij}}}{\sum_{k=1}^{N}e^{cos\theta_{ik}}}) \cdot x_j^{T}\\ &= \sum_{j=1}^{N}(\frac{e^{cos\theta_{ij}}}{\sum_{k=1}^{N}e^{cos\theta_{ik}}}) \cdot x_i \cdot x_j^{T} \\ &=\sum_{j=1}^{N}(\frac{e^{cos\theta_{ij}}}{\sum_{k=1}^{N}e^{cos\theta_{ik}}}) \cdot cos\theta_{ij} \end{align*} You say this represents density. But consider another function. \begin{align*} d_i &= x_i \cdot \hat x_i^T \\ &= x_i \cdot \sum_{j=1}^{N} x_j^{T}\\ &= \sum_{j=1}^{N} x_i \cdot x_j^{T} \\ &=\sum_{j=1}^{N}cos\theta_{ij} \end{align*} This would also represent the density of vectors. For example, consider the vectors to be points on the surface of the unit sphere, with the origin at the center of the sphere. If there are many vectors on one part of the sphere, the cosine to them would be $\sim 1$, and the sum of many such $cosines$ would lead to what you call a high density number.
There is a problem with such a density estimation though, and that is that we subtract for vectors that are on the opposite side of the vector considered.
For example: consider a case where there are 2 vectors $\{v_1,v_2\}$ pointing to one side and 2 more $\{v_3,v_4\}$ pointing to the other side. Then when we add the cosines from $v_1$ to all the vectors, we would get a value of 0 as we add the cosines, but this does not mean the density is low (note that we can extend the argument to n vectors on one side and n vectors pointing to the opposite side).
One way of dealing with the issue is to say $d_i = \sum_{j=1}^{N} \max(0,x_i \cdot x_j^{T})$. Of course you could also use the softmax weighted vectors for this case also.