Principal component analysis for a set of p-adic numbers?

75 Views Asked by At

I have been exploring machine learning algorithms that use p-adic metrics instead of Euclidean ones. Generally, where a Euclidean algorithm operates over $\mathbb{R}^n$, I have found that $\mathbb{Q}_p$ is easily rich and interesting enough, and I haven't needed to explore using $\mathbb{Q}_p^n$ yet.

As I'm exploring unsupervised methods and dimensionality reduction means, I have been puzzling over what dimensionality reduction means when in truth I only have one dimensional data in $\mathbb{Q}_p$. But yet, I think I should be able to simplify it somehow.

Is there some intuition about what PCA does in $\mathbb{R}^n$ that can be applied to $\mathbb{Q}_p$?

1

There are 1 best solutions below

0
On

What are you learning about machine learning in $\mathbf Q_p$ that is not revealed when working over $\mathbf R$, and what is your intuition that suggests PCA should have an analogue in $\mathbf Q_p^n$?

Principal component analysis for linear maps $\mathbf R^m \to \mathbf R^n$ is closely tied up with the fact that real symmetric $n \times n$ matrices have all real eigenvalues (a basis of orthogonal eigenvectors), and the importance of symmetric $n \times n$ matrices $A$ over $\mathbf R$ is their link to the standard dot product on $\mathbf R^n$: $$ \mathbf v \cdot A\mathbf w = A^\top \mathbf v \cdot \mathbf w = A\mathbf v \cdot \mathbf w $$ when $A^\top = A$. The proof that all eigenvalues of a symmetric real $n \times n$ matrix are real comes from the fact that there are a full set of eignevalues in $\mathbf C$, a quadratic extension of $\mathbf R$, and the interplay between the Hermitian inner product on $\mathbf C^n$ and conjugate-transposes to show all eigenvalues $\lambda$ of $A$ in $\mathbf C$ satisfy $\overline{\lambda} = \lambda$ and thus $\lambda \in \mathbf R$. The algebraic closure of $\mathbf Q_p$ is infinite-dimensional over $\mathbf Q_p$, so eigenvalues of $p$-adic $n \times n$ matrices need not lie in a quadratic extension of $\mathbf Q_p$ when $n \geq 3$.

The inner product on $\mathbf R^n$ is related to orthogonality and the interpretation of orthogonality in terms of unique nearest approximation in a hyperplane: for a vector $\mathbf v$ in $\mathbf R^n$ and hyperplane $H$ not containing $\mathbf v$, the orthogonal projection of $\mathbf v$ into $H$ is the unique element $\mathbf w$ of $H$ such that $\mathbf v - \mathbf w \perp \mathbf w$. There aren't useful inner products on $\mathbf Q_p^n$: what could an $\mathbf R$-bilinear map $\mathbf Q_p^n \times \mathbf Q_p^n \to \mathbf R$ be? The term orthogonal basis exists in $p$-adic functional analysis, but it has only a rough analogue of the "best approximation" property of orthogonal projections in $\mathbf R^n$ and doesn't correspond to any kind of unique vector in a hyperplane.

In contrast to real symmetric matrices, symmetric matrices over $\mathbf Q_p$ need not have eigenvalues in $\mathbf Q_p$. For example, if $p$ is an odd prime then there is always at least one $t \in \mathbf F_p^\times$ such that $t$ is a square and $t+1$ is a nonsquare. Let $a \in \mathbf Z_p^\times$ satisfy $t \equiv a^2 \bmod p$ and set $$ M = \begin{pmatrix} a&1/2\\1/2&0 \end{pmatrix} $$ The characteristic polynomial of $M$ is $x^2 - ax - 1/4$, which has eigenvalues $(a \pm \sqrt{a^2 + 1})/2$. These eigenvalues are in $\mathbf Q_p$ if and only if $a^2 + 1$ is a square in $\mathbf Q_p$. Since $a^2 + 1 \equiv t+1 \bmod p$, $a^2 + 1 \in \mathbf Z_p^\times$. Moreover, since $t + 1$ is not a square in $\mathbf F_p^\times$, $a^2 + 1$ is not a square in $\mathbf Z_p^\times$ and thus $a^2+1$ is not a square in $\mathbf Q_p$. Thus $M$ is a symmetric $2 \times 2$ matrix over $\mathbf Q_p$ with no eigenvalues in $\mathbf Q_p$.

For $p = 2$, the symmetric matrix $$ \begin{pmatrix} 2&1\\1&0 \end{pmatrix} $$ has characteristic polynomial $x^2 - 2x - 1$ and its roots $1 \pm \sqrt{2}$ don't lie in $\mathbf Q_2$.