Trajectories, Paths Clustering Using Expectation Maximization

48 Views Asked by At

[tl;dr]: Go to the question and the part below the question directly. The question should be easy. I'm a newbie on probability.

Trajectories are a series of paths, each of them consisting of multiple positions $(x, y)$.

I am trying to implement this paper(need paid access) which in the first chapter, it discusses how to cluster trajectories data of human walking using Expectation Maximization to determine motion patterns.

The author's explanation of trajectories is

Accordingly, we assume that the input to our algorithm is a collection of trajectories $s = {s_{1},...,s_{i} }$ between resting places. The output is a number of different types of motion patterns $θ = {θ_{1},...,θ_{M} }$ a person might exhibit in their natural environment. Each trajectory $s_{i}$ consists of a sequence $s_{i} = \{ s_{1}^{i}, s_{2}^{i}, ...,s_{i}^{i} \}$ Accordingly.

The Expectation Maximization algorithm takes two steps, E-step and M-step. According to the author:

The optimization involves two steps: calculating the expectations $E[c_{i m} \mid \theta^{[j]}, d] $ given the current model $θ^{[j]}$, and finding the new model $θ^{[j+1]}$ that has the maximum expected likelihood under these expectations. The first of these two steps are typically referred to as the E-step (short for expectation step), and the latter as the M-step (short for maximization step).

The equation for E-step is as follows:

$$ E\left[C_{i m} \mid \theta^{[j]}, d\right]=\eta^{\prime} \prod_{t=1}^{T} \exp \left(-\frac{1}{2 \sigma^{2}}\left\|x_{i}^{t}-\mu_{m}^{\lceil t / \beta \rceil [j]}\right\|^{2}\right) $$

Where

  • $d, i$: Trajectories, index of the trajectory.
  • $M, m$: Number of different types of motion patterns.
  • $C_{i m}$: Binary value 0 or 1. "correspondence variables" defined by the author. It is 1 if the $i$th trajectory corresponds to the $m$th pattern.
  • $j$: Model iteration number.
  • $\eta^{\prime}$: Normalization constants.
  • $T, t$: Trajectory length(number of positions inside a trajectory), index of positions in the trajectory.
  • $\sigma$: Gaussian distributions standard deviation.
  • $x, x_{i}^{t}$:Position on trajectories. The $t$th position on the $i$th trajectory.
  • $\mu$: EM means.
  • $K$: Gaussian distributions??? $K$is not in the formula but is needed to calculate $\beta$.
  • $\beta$: subsequent positions on the trajectories??? $\beta = \lceil T / K \rceil$???

Question: What is the meaning of these two variables exactly? $K$ and $\beta$.

The only time the author mentioned about $K$:

We begin with the description of our model of motion patterns, which is subsequently estimated from data using EM. A motion pattern denoted as $θ_{m}$ with $1 ≤ m ≤ M$ is represented by $K$ probability distributions $P (x \vert θ_{m}^{k})$ where $M$ is the number of different types of motion patterns in which a person is engaged.

The only time the author mentioned about $\beta$:

For each $θ_{m}^{k}$ the probability distribution $P(x | θ_{m}^{k})$ is computed based on $β = \lceil T/K \rceil$ subsequent positions on the trajectories. Accordingly, $P(x | θ_{m}^{k})$ specifies the probability that the person is at location $x$ after $[(k − 1) · β + 1; k · β]$ steps given that it is engaged in motion pattern m. Thus, we calculate the likelihood of a trajectory $d_{i}$ under the $m$th motion pattern $θ_{m}$ as $$P\left(d_{i} \mid \theta_{m}\right)=\prod_{t=1}^{T} P\left(x_{i}^{t} \mid \theta_{m}^{\lceil t / \beta\rceil}\right)$$

I need to implement both the E-step and M-step to do clustering on my trajectory data. Just a reference, here's the equation for M-step: $$ \mu_{m}^{k[j+1]}=\frac{1}{\beta} \cdot \sum_{t=(k-1) \cdot \beta+1}^{k \cdot \beta} \frac{\sum_{i=1}^{I} E\left[C_{i m} \mid \theta^{[j]}, d\right] x_{i}^{t}}{\sum_{i=1}^{I} E\left[C_{i m} \mid \theta^{[j]}, d\right]} . $$