Impact of the transformation matrix distribution on linear transformation

338 Views Asked by At

Let $X$ be a $m\times n$ ($m$: number of records, and $n$: number of attributes) normalized dataset (between $0$ and $1$). Denote $Y=XR$, where $R$ is an $n\times p$ matrix, and $p<n$. I understand if $R$ was drawn randomly from Gaussian distribution, e.g., $N(0,1)$ then the transformation preserve the Euclidean distances between instances (all of the pairwise distances between the points in the feature space will be preserved). But what if $R\sim U(0,1)$, does the transformation still preserve the distance between instances?

1

There are 1 best solutions below

0
On

Suppose we have $m$ records $(X_i:1\leq i \leq m)$ of normalized $n$-dimensional data (that is, for each $i\leq m$, we can write $X_i=(x^{(i)}_1,\ldots,x^{(i)}_n)\in[0,1]^n$). Then, multiplying any of the $X_i$ by a $n\times p$ matrix $R$ (where $p<n$) can be seen as projecting the vector $X_i$ onto $\mathbb R^p$. Upon reading your question, I understand that you're interested in projections that can be seen as somewhat faithful, in that there will be little difference between the Euclidean distance between vectors $\|X_i-X_j\|_2$ in $\mathbb R^n$ and their projection $\|(X_i-X_j)R\|_2$ on $\mathbb R^p$.


First, I believe your claim that if the entries of $R$ are i.i.d. $N(0,1)$ random variables, then the transformation preserves the distance is false. To see this, consider the simple counterexample: Suppose we have two instances with two attributes $X_1=[1,1]$ and $X_2=[0,1]$. Then, we have that $$\|X_1-X_2\|_2=\sqrt{1^2+0}=1.$$ Let $R=[R_1,R_2]^{\top}$ be our transformation. Then, if we want the norm to be preserved, it is necessary that $$1=|X_1R-X_2R|=|(R_1+R_2)-R_2|=|R_1|,$$ in which case $R_1$ can only take the values $-1$ and $1$, and hence is not $N(0,1)$.


What could be true is that the norm is preserved to some extent. Indeed, this would be a special case of the well-known Johnson-Lindenstrauss Lemma (see the Wikipedia article):

Lemma: Let $0<\epsilon<1$, and let $X=(x_1,\ldots,x_m)$ a set of $m$ points in $\mathbb R^N$. Given $n<N$ which satisfies some technical conditions relating to $m$ and $\epsilon$, there exists a linear map $T:\mathbb R^N\to\mathbb R^n$ such that for every $x_i,x_j\in X$, we have that $$(1-\epsilon)\|x_i-x_j\|_2\leq\|Tx_i-Tx_j\|_2\leq(1+\epsilon)\|x_i-x_j\|_2.$$

Many of the proofs of the Johnson-Lindenstrauss have been done using random matrices, effectively showing that some random matrices do indeed preserve the norm of vectors to some extent.

A lot of work has been done in this subject, and it would be very time consuming to compile them here or even explain how they work. I'll instead provide a few references for you to check it out.

  1. Database-friendly random projections: Johnson-Lindenstrauss with binary coins, by Dimitris Achlioptas; and
  2. A Simple Proof of the Restricted Isometry Property for Random Matrices, by Richard Baraniuk, Mark Davenport, Ronald DeVore, and Michael Wakin.