how close can you get to a random d-dimensional vector of +1 and -1 given k guesses?

55 Views Asked by At

Consider a uniformly randomly selected vector $v \in \lbrace +1,-1 \rbrace^d $ that is a vector of size d, consisting of +1 and -1 (there are 2^d such vectors)

I'm interested in understanding, how "close" to this vector can someone get by guessing random vectors, where closeness between two vectors $a,b$ is defined as $\frac{a \cdot b}{|a| |b|}$. In our specific case if $a,b\in \lbrace +1,-1 \rbrace^d $ then the closeness would be defined as $\frac{a \cdot b}{d} $ [since our vectors have a standard length].

Consider the following procedure:

Given an integer $k$, uniformly randomly select, k distinct vectors from $\lbrace +1,-1 \rbrace^d $ (you can imagine we have a bag of all $2^d$ vectors and we are removing from the bag (without putting them back in) k such vectors).

Then of these $k$ vectors we find a vector a $u_i$ that is CLOSEST to our target vector $v$. Formally speaking: the quantity $\frac{u_i \cdot v}{d}$ is maximized over all our choices. We call the maximal value $\frac{u_i \cdot v}{d}$ by the name $\Omega_{k,d}$

The Question:

What is the expected value of $\Omega_{k,d}$ as a function of $d,k$?

Some work:

Independent of $d$ if we let $k=1$ then its easy to reason that the expected value of $\Omega_{1,d} = 0$. That is given a single random vector we expect it to have dot product 0 with our target $v$.

This can be seen through induction, because $E[\Omega_{1,k}] + E[\Omega_{1,1}] = E[\Omega_{1,k+1}]$ and its very easy to see that given two random (+1,-1) elements, that the expected value of their product is 0.

As soon we let $k=2$ the situation gets more complex. I believe $\Omega_{2,d} \approx \frac{1}{2\sqrt{d}}$. I'm able to show that the standard deviation of $\Omega_{1,d} = \frac{1}{\sqrt{d}}$, and so a heuristic argument I want to make is that if we take two samples we can roughly pretend that they are uniformly selected from $\pm \frac{1}{\sqrt{d}}$, and then evaluate the result of our procedure over the 4 different cases that arise getting an expected value of $\frac{1}{2\sqrt{d}}$ as a result. This is definitely not exact but gets "truer" as "d" goes to infinity.

Tag Explanations: I think most of the tags are self explanatory, the addition of "computational complexity tag" is because this closely resembles some of the results people calculate with approximation algorithms so I figured it might attract the right type of attention.

1

There are 1 best solutions below

0
On BEST ANSWER

The important thing is how many positions agree between the vectors. By symmetry, the first vector might as well be all $+1$s. You are then asking for the maximum expected number of $+1$s in one of $k$ guesses. The probability of $m\ +1$s in one guess is ${d \choose m}\frac 1{2^d}$, so roughly speaking you want a match you get about $1$ time in $k$, so $$\frac 1k=\sum_{m=k}^d{d \choose m}\frac 1{2^d}$$

If $k$ is not too large compared to $2^d$, you can use the normal approximation. The standard deviation of the number of $+1$s is $\frac 12\sqrt {d}$. As about $\frac 1{40}$ of the area of a standard normal is above $2\sigma$, with $40$ guesses you would expect a $+2\sigma $ result, which is $\frac d2+\sqrt d\ \ +1$s