Understanding locally sensitive hashing

232 Views Asked by At

I recently studied about locally sensitive hashing an a follow explanation was given to me.

To produce a hash for a particular vector X with dimensions 5,

Draw 10 random vectors to form a plane and perform a dot product with the vector X.

In other words, a 10 x 5 matrix will be dot product with vector X.

The hash can be given as 1 if the result of each element in the resulting vector is positive else 0.

However, am i supposed to understand that these matrix can be considered a plane and how does it actually mean that if the dot product is negative/positive, it determines which side of the plane the vector is one?

Does the 10 random vectors form a subspace hence a plane?

Anyone can explain this? Thanks.

2

There are 2 best solutions below

11
On BEST ANSWER

Let $p_1, p_2, \cdots, p_{10}$ be the random vectors. For each $p_i$, it defines a hyperplane whose normal vector is $p_i$; that is, the hyperplane partitions $\mathbb{R}^5$ into two parts, namely, $$\{v \in \mathbb{R}^5 \mid p_i^Tv > 0\}\quad \text{and}\quad\{v \in \mathbb{R}^5 \mid p_i^Tv \leq 0\}$$

Therefore, LSH function maps $x$ to $1$ if and only if $$ x \in \bigcap_{i=1}^{10} \{v \in \mathbb{R}^5 \mid p_i^Tv > 0\} $$ As you can see, it is actually not a hyperplane in general.

1
On

No, your vectors do not form a subspace.

I think the idea is to get a $10$-bit hash value from your $X$ by computing the $10$ bits independently, each one by looking at the sign of one particular nonzero linear form on your vector space. If $x_1,\ldots,x_5$ designate the coordinates of your vector space, a linear form is an expression of the form $ax_1+bx_2+cx_3+dx_4+ex_5$ for some constant scalars $a,b,c,d,e$, in other words if it the value obtained by applying a one-line matrix $\pmatrix{a&b&c&d&e}$ to your vector$~X$ (as a column vector). The value of that expression is a number, so to get a bit from it just only look whether the value is positive.

This linear form (if it is not entirely zero, which of course you will choose) takes the value $0$ on a hyperplane of your space, in other words on a $4$-dimensional subspace. The bit you obtained by applying the linear form tells you on which side of this hyperplane your vector lies (the linear form make one side "positive" and the other side "negative"; the point of the hyperplane itself are consider as"positive"). Now if you apply a matrix with $10$ rows, then you get $10$ bits, each telling you on which side of a different hyperplane your vector is. If you have chosen your rows not to badly, then these hyperplanes chop you space into $2^{10}$ "chambers", and your hash code tells you in which of these chambers your vector$~X$ lies.