Is this a useful dimensionality reduction on real projective space?

16 Views Asked by Bumbble Comm At 11 May 2026 - 9:28

In the following, I will derive a dimensionality reduction for the real projective space. As I will use scalar product as a means of distance, I am unsure if the dimensionality reduction has a useful meaning. I will begin by defining the real projective space.

For $a, b \in \mathbb{R}^n$, we can define the equivalence relation $a\sim b \Leftrightarrow \exists \lambda \in \mathbb{R}\setminus\{0\}: a = \lambda b$. The real projective space $\mathbb{R}\mathbb{P}^n := (\mathbb{R}^n\setminus\{0\}) / \sim$ is then defined in terms of this equivalence. In my opinion, $d: \mathbb{R}\mathbb{P}^n \times \mathbb{R}\mathbb{P}^n \to \mathbb{R}$ defined by $d(a, b) = \arccos(\frac{\vert a\cdot b\vert}{\vert{a}\vert\vert{b}\vert})$ with $a\cdot b$ denoting the scalar product, is a reasonable metric on $\mathbb{R}\mathbb{P}^n$. As $\arccos$ is a non-linear complicated function, I will use $s(a,b)=\frac{\vert a\cdot b\vert}{\vert{a}\vert\vert{b}\vert}$ as a means of distance. Contrary to a metric, for $s(a,b) = 0$, $a$ and $b$ have maximal distance, while $s(a,b)=1 \Leftrightarrow a=b$.

From now on, consider $x^1, ..., x^m\in \mathbb{R}\mathbb{P}^n$ as the data points whose dimension we want to reduce. With $x_j^i$ we denote $j$-th component of $x^i$. For arbitrary $p\in \mathbb{R}\mathbb{P}^n$, we define the Fréchet variance $L(p)=\sum_{i=1}^m s^2(p,x^i)=\sum_{i=1}^m\left(\sum_{j=1}^np_jx_j^i\right)^2$, assuming $\vert x^i\vert=\vert p\vert = 1$ in the second step. To maintain this assumption, we can normalize the data points $x^i$ to length one before processing them as they remain the same point in $\mathbb{R}\mathbb{P}^n$. For $p$, we describe this assumption by a secondary condition, $g(p) = \vert p \vert^2 - 1 = \sum_{j=1}^n p_j^2 - 1 = 0$. With this, we can employ Lagrangian multipliers to find extreme points of $L$.

Before, we calculate the gradients of $L$ and $g$, let us define two matrices. $X\in\mathbb{R}^{n\times m}$ is the matrix describing our normalized data points, $X_{i,j}=x_j^i$. We define $W=X^T X$, or $W_{k,j}=\sum_i x_k^i x_j^i$.

The partial derivatives of $L$ are $$\frac{\partial L}{\partial p_k}=2\sum_i x_k^i\sum_j p_j x_j^i=2\sum_j\sum_i x_k^i x_j^i p_j = 2\sum_j W_{k,j}p_j .$$ This gives for the gradient: $\nabla L = 2 Wp$. Similarly, we arrive at $\nabla g = 2 p$. Hence, the Lagrange equation becomes $$Wp = \lambda p,$$ where $\lambda$ is the Lagrange multiplier.

The Lagrange equation reduces to the Eigenvalue equation for $W$. The secondary condition $g(v) = 0$, is trivially solved for the normalized eigenvectors $v$. With the Envelope theorem, one can proof that $L(v)=\lambda_v$, where $\lambda_v$ is the eigenvalue corresponding to eigenvector $v$. From this follows that all eigenvalues are positive. The eigenvectors $v$ are pair-wise orthogonal. They are either extreme points or saddle points of $L$. In the following, we will interpret the eigenvectors and their eigenvalues.

If we assume $x_j^i>0$ or equivalently $x_j^i<0$, then the eigenvector $v^1$ with highest eigenvalue can be regarded as Fréchet mean. Thus, $L(v^1)=\lambda_1$ is the Fréchet variance of the data points, a measure of the spread of the data. We have $\vert x^i\vert^2 =1$ and, hence, $\frac{1}{m}\sum_i\vert x^i\vert^2 =1$. As the eigenvectors $v^j$ form an orthonormal basis, we can write $\vert x^i\vert^2 =\sum_{j=1}^n(v^j\cdot x^i)^2$. Then, we have $$\frac{1}{nm}\sum_{j=1}^nL(v^j)=\frac{1}{nm}\sum_j\sum_i(v^j\cdot x^i)^2=1.$$ We define $V:=1-L(v^1)$, which is a positive non-negative number. High values of $V$ correspond to high spread in the data, while low $V$ corresponds to fewer spread. It follows, that $V=\frac{1}{nm}\sum_{j=2}^nL(v^j)$. Therefore, the fractions $\frac{L(v^j)}{V}$ explain how much of the total spread of the data is captured along axis $j$. This is analogous to the explained variance in PCA. Projecting the data points along $v^2$ and $v^3$ gives a dimensionality reduction, where the Fréchet mean $v^1$ is in the origin. There is another analogy to PCA, as PCA is the question for eigenvalues of $Y^TY$, where $Y$ is the matrix describing the data points $y^i\in\mathbb{R}^n$ appropriately translated to give the null vector as their mean.

I have not yet tried to proof the above interpretation for arbitrary $x_i^j$, as I have three other more pressing questions:

Currently, my most important question is whether or how much the above defined Fréchet variance $L(p)$ skews the subsequent dimensionality reduction due to the fact that the scalar product is used instead of the more correct cosine distance. Furthermore, I would like to find out whether this scheme has been described before for the real projective space. Lastly, I would like to get real data points that lie on the real projective space. Currently, I can only think of money exchange rates that have the property of being elements of $\mathbb{R}\mathbb{P}^n$. But for those, we have $x_j^i>0$...

PS: I was unsure whether Data Science Stack Exchange might be more fitting to the question.

Original Q&A

Is this a useful dimensionality reduction on real projective space?

Related Questions in INNER-PRODUCTS

Related Questions in LAGRANGE-MULTIPLIER

Related Questions in PROJECTIVE-SPACE

Related Questions in DATA-ANALYSIS

Related Questions in PRINCIPAL-COMPONENT-ANALYSIS

Trending Questions

Popular # Hahtags

Popular Questions