PCA: dual formulations and optimization "duality" interpretation

98 Views Asked by At

Bishop, p563 formulates standard PCA for a set of datapoints $\{x_i\}_{i=1}^n$, $x_i \in \mathbb{R}^d$ in two ways:

  1. Maxize variance in PC direction $u$: $$ \max_u \frac{1}{n}\sum_{i=1}^n (u^Tx_i -u^T\bar{x})^2 = u^T\hat{\Sigma}u, \qquad \text{s.t. } \|u\|=1 $$
  2. Minimize the distortion of projected points $\tilde{x}_i$ (onto $p$ subspace of PCs) and original points $x_i$: $$ \min_{u_j, z_{ij}, b_j}J(u_j, z_{ij}, b_j) = \frac{1}{n}\sum_{i=1}^n \|x_i - \tilde{x}_i\|^2, \qquad \text{s.t. } \|u_j\|=1, \, j=1,\ldots,d $$ for $\tilde{x}_i = \sum_{j=1}^p z_{ij} u_j + \sum_{j=p+1}^d b_j u_j$ where $\{u_j\}_{j=1}^d$ is an ON basis of $\mathbb{R}^d$. An important step is that writing each data point $x_i = \sum_{j=1}^d \alpha_{ij} u_j = \sum_{j=1}^d (x_i^Tu_j)\, u_j$ in terms of a new ON basis (our optimization variables).

I'm wondering if these problems are somehow "dual" to eachother as optimization problems. Also, would it be right to say that writing $x_i = \sum_{j=1}^d \alpha_{ij} u_j = \sum_{j=1}^d (x_i^Tu_j)\, u_j$ is a "dual" representation of $x_i$? It feels that each $u_j$ is a linear functional in $(\mathbb{R}^d)^*$, but I'm not sure I'm thinking about this correctly.