I am trying to read this paper on video summarization, and I am running into difficulties understanding some of notation.
Given a sequence of vectors $x_t$, a kernel function is defined with the feature space $\mathcal{H}$ and variance between $x_t$ and $x_{t+1}$ is computed as $v_{t_{i},t_{i+1}} = \sum_{t=t_{i}}^{t_{i+1}-1}||\phi(x_{t})-\mu_{i}||^2_{\mathcal{H}}$
Later, in the algorithm section, the above variance is calculated as something that implies $v_{t,t+d} = \sum_{i=t}^{t+d-1}K(x_i,x_i) - \frac{1}{d}\cdot\sum_{i,j=t}^{t+d-1}K(x_i,x_j)$ I am having trouble seeing the connection, though I do know that $K(x,y)=\phi(x)^T\phi(y)$
As you've noted, the authors switched their notation. If you define $t:=t_i$, $d:=t_{i+1}-t_i$, and (for brevity) $w_k:=\phi(x_{t_i+k-1})$ for $k=1,\ldots,d$, then the formula $$\mu_i=\frac{\sum_{t=t_i}^{t_{i+1}-1}\phi(x_t)}{t_{i+1}-t_i}\tag1 $$becomes $$\mu_i=\frac{\sum_{k=1}^d w_k}{d}\tag2 $$ (so $\mu_i$ is the mean of the $w$'s), while the formula $$ v_{t_i,t_{i+1}} = \sum_{t=t_i}^{t_{i+1}-1}\|\phi(x_t)-\mu_i\|^2\tag3 $$ becomes $$ v_{t,t+d}=\sum_{k=1}^d\|w_k-\mu_i\|^2,\tag4 $$ i.e. (4) is the variance of the $w$'s. You can expand (4) using $\|a\|^2=a^Ta$ to find $$\|w_k-\mu_i\|^2=w_k^Tw_k-w_k^T\mu_i-\mu_i^Tw_k+\mu_i^T\mu_i,\tag5$$ then sum (5) from $k=1$ to $d$ to obtain [using (2)] $$ \sum_{k=1}^d\|w_k-\mu_i\|^2=\sum_{k=1}^dw_k^T w_k -\frac1d\left(\sum_{k=1}^d w_k\right)^T\left(\sum_{k=1}^d w_k\right),\tag6 $$ which is the analog of the familiar formula for the variance of a univariate list. Now undo all the new notation, remembering that $K(x,y)=\phi(x)^T\phi(y)$, and you should arrive at that final formula.