How do I compute the updates to the EM algorithm in Quantum Clustering?

106 Views Asked by At

In their paper "Quantum Clustering and Gaussian Mixtures," Mahajabin Rahman and Davi Geiger set out a rather interesting way to cluster data that apparently works better than the traditional mixture of Gaussians. Rather than using EM on a mixture of gaussians, they use EM on a "Mixture of waves," borrowing the idea of a complex wave function that squares into a probability from Quantum Mechanics.

In the paper, they only deal with clustering into 2 classes, because it allows them to use many neat tricks which greatly simplify the computations. I want to extend this to k clusters so that I can code it up and use it in my work.

Link to paper: https://arxiv.org/pdf/1612.09199.pdf

I have been trying to figure out what the update steps should be for the weights and the parameters (in the M step of EM), but for the life of me I can't seem to find a nice form of the derivatives that allows me to easily solve for them. I'm beginning to suspect that I'd be better off numerically trying to compute the parameter updates at each iteration of EM, but perhaps there's some EM-related or quantum-related math technique that I don't know about (or maybe I'm just missing something very simple).

Details:

I'll try to sum up the details as best as I can (correcting on the way some typos that I found in the paper ;P ) -- let me know if it's unclear or if I have made a mistake.

The core idea is as follows:

Instead of using Gaussian distributions, they employ the use of a "wave function" (whose modulus squared yields the Gaussian distribution):

For a class $k$, associated mean $\mu_k$ and associated covariance matrix $\mathbf{C}_k$, and a $d$-dimensional data point $p_i$, and some phase angle $\phi_k$,

$$ \psi_k(p_i| \mu_k, \mathbf{C}_k, \phi_k) = \frac{1}{ \sqrt{(2\pi)^{\frac{d}{2}} |\mathbf{C}_k|^{\frac{1}{2}} }} e^{-\frac{1}{4}(p_i - \mu_k)^T \mathbf{C}_k^{-1}(p_i - \mu_k)} e^{-i\phi_k} $$

In the paper, they write: $$ Z_k = (2\pi)^{\frac{d}{2}}|\mathbf{C}_k|^{\frac{d}{2}} $$

So that

$$ \psi_k(p_i| \mu_k, \mathbf{C}_k, \phi_k) = \frac{1}{ \sqrt{Z_k}} e^{-\frac{1}{4}(p_i - \mu_k)^T \mathbf{C}_k^{-1}(p_i - \mu_k)} e^{-i\phi_k} $$

You'll notice that $|\psi_k(p_i|\mu_k, \mathbf{C}_k, \phi_k)|^2$ yields a Gaussian pdf at point $p_i$ with mean $\mu_k$ and covariance $\mathbf{C}_k$. Therein lies the "quantum" part of this idea -- just as in quantum mechanics you first add amplitudes of wave functions and then square the modulus at the end to get a probability.

So instead of mixing gaussians, you first mix the waves to find a mixture of waves function, using weights $(\alpha_1, ... , \alpha_k)$:

$$ \psi(p_i|\alpha, \mu, \mathbf{C}, \phi) = \sum_{j=1}^k \alpha_j \psi_j(p_i| \mu_j, \mathbf{C}_j, \phi_j) $$ To simplify the notation, I've written: $$ \alpha = (\alpha_1,...,\alpha_k) \\ \mu = (\mu_1, ..., \mu_k) \\ \mathbf{C} = (\mathbf{C}_1, ..., \mathbf{C}_k) \\ \phi = (\phi_1,...,\phi_k) $$

Then once you have mixed the waves, take the square of the modulus of this function to obtain the probability

$$ P(p_i|\alpha, \mu, \mathbf{C}, \phi) = |\psi(p_i|\alpha, \mu, \mathbf{C}, \phi)|^2 \\ = \sum_{j=1}^k \left[\frac{\alpha_j}{\sqrt{Z_j}} e^{-\frac{1}{4}(p_i - \mu_j)^T \mathbf{C}_j^{-1}(p_i - \mu_j)} \sum_{l=1}^k \frac{\alpha_l^*}{\sqrt{Z_l}} e^{-\frac{1}{4}(p_i - \mu_l)^T \mathbf{C}_l^{-1}(p_i - \mu_l)} \cos(\phi_j - \phi_l)\right] $$ where $x^*$ is the conjugate of $x$.

And we can also group the parameters of the waves separately from the weights $$ \theta_k = (\mu_k, \mathbf{C}_k, \phi_k) \\ \theta = (\theta_1,...,\theta_k) $$ So now I can write $$ P(p_i|\alpha, \theta) = |\psi(p_i|\alpha, \theta)|^2 $$

Using Bayes $$ P(p_i|\alpha, \theta) = \sum_{j=1}^k P(p_i, j| \alpha, \theta) $$

Meaning $$ P(p_i, j| \alpha, \theta) = \frac{\alpha_j}{\sqrt{Z_j}} e^{-\frac{1}{4}(p_i - \mu_j)^T \mathbf{C}_j^{-1}(p_i - \mu_j)} \sum_{l=1}^k \frac{\alpha_l^*}{\sqrt{Z_l}} e^{-\frac{1}{4}(p_i - \mu_l)^T \mathbf{C}_l^{-1}(p_i - \mu_l)} \cos(\phi_j - \phi_l) $$

And by writing the (unnormalized) Gaussian $$ G_{i,j} = \frac{1}{\sqrt{Z_j}} e^{-\frac{1}{4}(p_i - \mu_j)^T \mathbf{C}_j^{-1}(p_i - \mu_j)} $$

We can write

$$ P(p_i, j| \alpha, \theta) = \alpha_j G_{i,j} \sum_{l=1}^k \alpha_l^* \cos(\phi_j - \phi_l) G_{i,l} $$

Now finally we can start the EM method: we use $\alpha^t, \theta^t$ to denote the parameters at time $t$.

First the E-step: compute the $Q$ function $$ Q_i^t(j) = P(j | p_i, \alpha^t, \theta_j^t) \\ = \frac{P(p_i, j|\alpha^t, \theta_j^t)}{P(p_i|\alpha^t, \theta_j^t)} \\ = \frac{P(p_i, j|\alpha^t, \theta_j^t)}{\sum_{l=1}^k P(p_i, l|\alpha^t, \theta_l^t)} \\ = \frac{\alpha_j G_{i,j} \sum_{l=1}^k \alpha_l^* \cos(\phi_j - \phi_l) G_{i,l}}{\sum_{m=1}^k \alpha_m G_{i,m} \sum_{l=1}^k \alpha_l^* \cos(\phi_j - \phi_l) G_{i,l}} $$

Then the M-step, assuming there are $N$ data points: $$ (\alpha^{t+1}, \theta^{t+1}) = \arg\max_{\alpha, \theta} \sum_{i=1}^N \sum_{j=1}^k Q_i^{t-1}(j) \log \frac{P(p_i, j| \alpha^{t-1}, \theta^{t-1})}{Q_i^{t-1}(j)} \\ = \arg\max_{\alpha, \theta} \sum_{i=1}^N \sum_{j=1}^k Q_i^{t-1}(j) \log P(p_i, j| \alpha^{t-1}, \theta^{t-1}) $$

Now I thought all I would have to do is take derivatives with respect to $\alpha_j, \mu_j, \mathbf{C}^{-1}_j$ and $\phi_j$, but it turns out that none of these derivatives seem to be ones that I am able to solve. Consequently I have no idea how to get the update equations.