Plot information curve, Blahut-Arimoto algorithm

42 Views Asked by At

I have a data file with x- and y-coordinates and have written a code to implement the Blahut-Arimoto algorithm to evaluate the clustering membership probability. Now I want to run this code for 2 clusters ($Nc=2$) and construct its information curve, however, I do not know how to construct an information curve and what metrics to use (a more detailed problem description is given at the very end). All help is appreciated!

Description: I have a data file that contains 200 data points to be clustered. In the file, the first and second columns are the x- and y-coordinates of the data points, respectively. Moreover, the first and second 100 points are independently sampled from the normal distributions,

$$N\left(\mu=(0 \ 1), \Lambda = \begin{pmatrix} 0.09 & 0 \\ 0 & 0.09\end{pmatrix}\right) \text{ and } N\left(\mu=(1 \ 0), \Lambda = \begin{pmatrix} 0.09 & 0 \\ 0 & 0.09\end{pmatrix}\right),$$

respectively. And I want to write a code to implement the Blahut-Arimoto algorithm to evaluate the clustering membership probability, $p(\tilde{x}|x)$ with fixed number of clusters, Nc, and compression-distortion tradeoff parameter, $\beta$. The code should implement a multiple run each starting with random initial conditions. The element-to-element distance matrix is assumed to have Euclidean distance and each data point carries the same weight, i.e. $p(\mathbf{x}_i)=1/200$.

Blahut-Arimoto algorithm: According to Elements of Information Theory by Thomas M. Cover and Joy A. Thomas, the Blahut-Arimoto algorithm works as follows: We begin with a choice of $\beta$ and an initial output distribution $r(\hat{x})$, and calculate the conditional distribution $q(\hat{x}|x)$ that minimizes the mutual information subject to the distortion constraint via

$$q(\hat{x}|x)=\frac{r(\hat{x})e^{-\beta d(x,\hat{x})}}{\sum_x r(\hat{x})e^{-\beta d(x,\hat{x})}}.$$

For this conditional distribution, $q(\hat{x}|x)$, we calculate the output distribution, $r(\hat{x})$, that minimizes the mutual information via

$$r(\hat{x})=\sum_x p(x)q(\hat{x}|x).$$

We use this output distribution as the starting point of the next iteration. Each step in the iteration, minimizing over $q(\hat{x}|x)$ and then minimizing over $r(\hat{x})$, reduces the right-hand side of $R(D)$ (the rate distortion function).

Code: Below you can see the code, which I wrote in python.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.spatial.distance import pdist, squareform

# Read the data from the provided text file
data = pd.read_table("Data-Exercise2-2024.txt", header=None)
data = data[0].str.split(expand=True).astype(float)

# Plot the data points on the x-y plane
plt.scatter(data[0], data[1])
plt.xlabel("X")
plt.ylabel("Y")
plt.title("Data Points")
plt.show()

# Calculate the element-to-element distance matrix using Euclidean distance
distance_matrix = squareform(pdist(data, metric='euclidean'))

# Calculate the distortion matrix using squared distortion
distortion_matrix = distance_matrix ** 2

# Weights for all data points (assuming equal weight)
p = np.full(len(distance_matrix), 1/len(distance_matrix))

# Function to calculate the conditional probabilities q(x_tilde|x)
def calculate_q(distortion_matrix, r, beta):
    numerator = np.exp(-beta * distortion_matrix) @ r
    return numerator / np.sum(numerator, axis=0)

# Function to calculate the output distribution r(x_tilde)
def calculate_r(p, q):
    return p[:, None] * q

# Blahut-Arimoto algorithm with squared distortion
def blahut_arimoto_impl(Nc, beta, max_iter, tolerance, distance_matrix, p):
    squared_distortions = []
    
    for _ in range(10):  # 10 random initializations
        r = np.random.rand(len(data), Nc)  # Random initialization
        
        for _ in range(max_iter):
            r_old = r.copy()
            q = calculate_q(distance_matrix, r, beta)
            r = calculate_r(p, q)
            
            # Check for convergence
            if np.sum(np.abs(r - r_old)) < tolerance:
                break

# Parameters
Nc = 2  # Different numbers of clusters
beta = 5
max_iter = 100
tolerance = 1e-6

blahut_arimoto_impl(Nc, beta, max_iter, tolerance, distance_matrix, p)

Problem: I now want to run this code for $Nc=2$ (2 clusters) and construct its information curve for different values of $\beta$ in between 1 to 40. The problem I am having is that I do not know how to construct the information curve and what metric I should use on the y-axis of this plot.

I read somewhere that one can use either the distortion or the mutual information as a measure, but I cannot get my head around how to do this. Maybe the answer is just as easy as that I should update the distortion matrix for each iteration (now I only use the distance matrix raised to 2 since I initially do not have any $\hat{x}$), but then I need to know either the reconstructed data or if there is another approach that I am not aware of.

Data: This is some of the data in my txt file if someone is interested to try the code:

0.35018    1.01496
-0.491826    1.06325
-0.0424478    0.810602
-0.0485904    0.429519
-0.313149    1.27697
-0.0745224    1.25301
-0.24639    1.13452
0.48474    0.645292
-0.192411    0.722434
-0.482834    1.09381
0.138134    1.11462
-0.249814    1.47742
0.348229    1.01375
0.264055    1.24531
-0.0231769    1.18124
0.166838    0.89533
-0.239649    1.17173
-0.273667    0.532211
0.338615    0.835743
-0.411282    1.02389
-0.398042    1.16161
0.37481    1.51544
-0.00191126    1.23194
-0.273601    1.02413
0.0866693    1.49634
-0.319453    1.35711
-0.432486    1.38478
-0.139173    0.61804
0.222087    1.14062