Calculate the distortion measure $d(x,\hat{x})$

24 Views Asked by At

I want to to implement the Blahut-Arimoto algorithm to evaluate the clustering membership probability, $p(\hat{x}|\mathbf{x})$, with fixed number of clusters, $N_c$, and compression-distortion tradeoff parameter, $\beta$. However, I need help on how to calculate the distortion measure $d(x,\hat{x})$.

I have a data file that contains 200 data points to be clustered. The first and second columns are the x- and y-coordinates of the data points. Moreover, the first and second 100 points are independently sampled from the normal distributions,

$$N\left(\mu=(0 \ 1), \Lambda = \begin{pmatrix} 0.09 & 0 \\ 0 & 0.09\end{pmatrix}\right) \text{ and } N\left(\mu=(1 \ 0), \Lambda = \begin{pmatrix} 0.09 & 0 \\ 0 & 0.09\end{pmatrix}\right),$$

respectively.

In the problem statement they tell me to prepare the element-to-element distance matrix $d(\mathbf{x}_i, \mathbf{x}_j)=\sqrt{(x_i-x_j)^2+(y_i-y_j)^2}$ where $(i,j=1,2,...,200)$, that (quote) will be the input for the clustering algorithm. Now to the problem.

In the Blahut-Aritmoto algoritm we are supposed to estimate the conditional probability via the following equation, that minimizes the mutual information subject to the distortion constraint via

$$p(\hat{x}|\mathbf{x})=\frac{p(\hat{x})e^{-\beta d(x,\hat{x})}}{\sum_x p(\hat{x})e^{-\beta d(x,\hat{x})}}.$$

I am told to start with random initial conditions so I have already estimated $p(\hat{x})$, but how do I estimate $d(x,\hat{x})$? My initial thought was to use the distance matrix as the distortion matrix, but after going with this for a while it does not seem correct. This occurred to me since I later could not calculate

$$\langle d(x,\hat{x}) \rangle_{p(x|\hat{x})}=\sum\sum p(x,\hat{x})d(x,\hat{x})$$

because of the incorrect dimensions between the conditional probability and the distance matrix. So, now my only idea is to use the squared distortion measure. But what is confusing to me is that they state that the distance matrix will be the input for the clustering algorithm. Does that mean that $x$ in the squared distortion measure, if used, represents the elements in the distance matrix?

All tips on helping me understand how to estimate the initial distortion measure/matrix in my case is appreciated! And for the interested, this is a part of my data (10/200) and how I calculated the distance matrix using the given euclidean distance:

dist_m <- as.matrix(dist(data, method = "euclidean"))

> data
             V1          V2
1    0.35018000  1.01496000
2   -0.49182600  1.06325000
3   -0.04244780  0.81060200
4   -0.04859040  0.42951900
5   -0.31314900  1.27697000
6   -0.07452240  1.25301000
7   -0.24639000  1.13452000
8    0.48474000  0.64529200
9   -0.19241100  0.72243400
10  -0.48283400  1.09381000