Expected length of spanning sequences of vectors

187 Views Asked by At

Given an infinitely-long sequence $$L = V \cdot V \cdot V \cdots$$ ...that is the repeated concatenation of $$V = (v_1, \dots, v_{2^n - 1}), v_i \in \Bbb{F}_2^n$$ ...a sequence of $2^n - 1$ vectors where all subsequences of length $n$ form a basis for $\Bbb{F}_2^n$ (and this rule "wraps around" or "crosses" the boundary of a concatenation; see this question and answer for how this sequence can be constructed and how there is such a sequence for all $n$).

Then what is the expected length $k$ of a subsequence $$S = (v_{r(i, 1)}, \dots, v_{r(i, k)})$$ ...that starts at a random location $i$ in $L$ and continues by skipping each subsequent vector with probability $p$ (i.e. including each subsequent vector with probability $1 - p$) until its vectors span $\Bbb{F}_2^n$?

I think this defines $r(i, j)$ to get the skipping action: $$r(i, j) = \begin{cases} i, & j = 1 \\ r(i, j) + 1, & X < p \\ r(i, j - 1) + 1, & \mbox{otherwise} \end{cases}$$ ...where $X$ is a uniform distribution from 0 to 1. This should make $r(i, j)$ a monotonically increasing function that either increases by one from the previous value or randomly recurses to find another value to increase by (in effect randomly skipping an index with probability $p$). If $p = 0$ then it just gives integers at least as large as $i$; higher values of $p$ mean that a vector in $L$ is more likely to be skipped.

For example, here's the first part of an $L$ for $\Bbb{F}_2^3$: $$v_1 = (0, 0, 1)$$ $$v_2 = (0, 1, 0)$$ $$v_3 = (1, 0, 0)$$ $$v_4 = (0, 1, 1)$$ $$v_5 = (1, 1, 0)$$ $$v_6 = (1, 1, 1)$$ $$v_7 = (1, 0, 1)$$ $$v_8 = v_1 = (0, 0, 1)$$ $$v_9 = v_2 = (0, 1, 0)$$ $$\vdots$$

Notice how if we let $S' = (v_1, v_2)$ and we're looking for another vector to span the field but due to the random process we skip $v_3$, then chances are still good that we'll soon find another vector to finish the sequence. For example, although $v_4$ won't do, $v_5, v_6, v_7, v_{3 + 2^n - 1} = v_{10} = v_3$, and any vectors offset in $L$ from those by a multiple of $2^n - 1 = 7$ are perfectly suitable to finish the sequence. But $v_4$ is more likely to be included than the later vectors so $k$ is more likely to be larger since $v_4$ is effectively a waste.

Also, how does the expected $k$ compare if we follow the same procedure but replace $L$ with $L'$, an infinite sequence of vectors drawn (uniformly) randomly from $\Bbb{F}_2^n$?

I'm not really sure where to start with the maths. In the case of $L'$ (random vectors) the likelihood of selecting a vector that (along with $n - 1$ previously-selected linearly-independent vectors) spans the field feels like it ought to be around 50% (since exactly one dimension will be missing; this is regardless of $p$ since $L'$ is random anyway). Indeed with some software I can tell empirically that the average value of $k$ is ~$1.65 + n$ and that the values of $k$ follow what looks like a Poisson distribution (I tested for $10 \leq n \leq 200$).

1

There are 1 best solutions below

6
On BEST ANSWER

For your example $L$, we can do the calculation explicitly. Triples of vectors that sum to zero are always in relative positions $(0,1,3)$. This is the only relevant property, so there's complete symmetry and it doesn't matter where we start. Also, note that once a vector is in the space we've already spanned (whether we've selected this specific vector or not), we can ignore it when it's selected.

First we take an expected number $\frac1{1-p}$ of steps to select the first vector. Label the remaining vectors in order after the selected one $v_0$ to $v_5$. The next vector selected will be $v_k$ with probability

$$ q_k=p^k(1-p)\left(1+p^6+p^{2\cdot6}+\cdots\right)=\frac{p^k(1-p)}{1-p^6} $$

after an expected number

$$ 1+k+7\left(1-p^6\right)\left(0\cdot1+1\cdot p^6+2\cdot p^{2\cdot6}+\cdots\right)=1+k+\frac{7p^6}{1-p^6} $$

of steps, so this selection takes an expected number

$$ \sum_{k=0}^5\frac{p^k(1-p)}{1-p^6}\left(1+k+\frac{7p^6}{1-p^6}\right)=\frac{1-p^7}{(1-p)\left(1-p^6\right)} $$

of steps.

Now we've spanned some triple, and if we renumber such that the triple is at $(0,1,3)$, the new index $j$ of the vector just selected depends on $k$ as follows:

\begin{array}{c|c} k&j\\\hline 0&1\\ 1&3\\ 2&3\\ 3&0\\ 4&1\\ 5&0\\ \end{array}

Denote the expected number of steps for selecting the third vector after selecting (renumbered) vector $j$ as the second vector by $x_j$. Then we have

\begin{align} x_0&=1+x_1\;,\\ x_1&=1+p(1+x_3)\;,\\ x_3&=1+p(1+p(1+p(1+x_0)))\;. \end{align}

Substituting the second equation into the first yields $x_0=2+p(1+x_3)$, and substituting that into the third yields $x_3=1+p(1+p(1+p(3+p(1+x_3))))$, and thus

\begin{align} x_3&=\frac{1+p+p^2+3p^3+p^4}{1-p^4}\;,\\ x_1&=\frac{1+2p+p^2+p^3+2p^4}{1-p^4}\;,\\ x_0&=\frac{2+2p+p^2+p^3+p^4}{1-p^4}\;. \end{align}

Thus the selection of the third vector is expected to take

$$ (q_3+q_5)x_0+(q_0+q_4)x_1+(q_1+q_2)x_3\\ = \frac{1-p}{\left(1-p^6\right)\left(1-p^4\right)}\left(\left(p^3+p^5\right)\left(2+2p+p^2+p^3+p^4\right)+\\\left(p^0+p^4\right)\left(1+2p+p^2+p^3+2p^4\right)+\left(p^1+p^2\right)\left(1+p+p^2+3p^3+p^4\right)\right)\\ =\frac{1+2p+p^2+4p^3+5p^4+4p^5+p^6+2p^7+p^8}{\left(1-p^6\right)\left(1+p^2\right)} $$

steps. So the expected total number of steps is

$$ \frac1{1-p}+\frac{1-p^7}{(1-p)\left(1-p^6\right)}+\frac{1+2p+p^2+4p^3+5p^4+4p^5+p^6+2p^7+p^8}{\left(1-p^6\right)\left(1+p^2\right)}\\ =\frac{3+4p+5p^2+8p^3+9p^4+8p^5+4p^6+4p^7+2p^8}{\left(1-p^6\right)\left(1+p^2\right)}\;. $$

Here's code I used to test this result.

The messiness of the result already for this simple case suggests that you may not get a simple closed form for general $n$. In any case you'd first need to show that the probability doesn't depend on the choice of $L$; as Jyrki pointed out in the answer you linked to, different generators of $\Bbb{F}_{2^n}^*$ can be used to generate different sequences.

The case of vectors uniformly randomly drawn from $\mathbb F_2^n$ that you want to compare with is relatively straightforward to treat. If you've already spanned a $k$-dimensional space, $2^k$ vectors are no longer useful, so the probability of gaining a dimension is $(1-p)\left(2^n-2^k\right)/\,2^n$. Thus the expected number of steps is

$$ \frac1{1-p}\sum_{k=0}^{n-1}\frac{2^n}{2^n-2^k}\;. $$

For $n=3$, this is

$$ \frac{94}{21}\cdot\frac1{1-p}\;. $$

But that's not really a fair comparison, since you not only ordered the vectors but also removed the zero vector. So perhaps we should compare with uniformly randomly drawing from $\mathbb F_2^n\setminus\{0\}$. The zero vector has to be taken out of both $2^n$ and $2^k$, so now we get

$$ \frac1{1-p}\sum_{k=0}^{n-1}\frac{2^n-1}{2^n-2^k}\;, $$

and for $n=3$ this is

$$ \frac{47}{12}\cdot\frac1{1-p}\;. $$

Here's a plot that compares the probabilities in the three cases.

Note that the two methods without the zero vector have the same residue $-\frac{47}{12}$ at $p=1$, which makes sense since if you skip almost all vectors it no longer matters whether you ordered them. By contrast, at $p=0$ your method is of course certain to complete in exactly $3$ steps whereas the random method takes almost $4$.