Large N limit for $P(\mathbf{x} \cdot\mathbf{y}\cdot \sum\mathbf{x}\cdot \sum \mathbf{y}>0)$ for two binary vector $\mathbf{x}$ and $\mathbf{y}$

67 Views Asked by At

I have two binary vectors $\mathbf{x}$, and $\mathbf{y}$, each with $N$ elements; Each element of $\mathbf{x}$ and $\mathbf{y}$ belongs to {-1, 1}, and is drawn from uniform random distribution.

Now I would like to derive the large $N$ limit for $P(\mathbf{x} \cdot \mathbf{y} \cdot (\sum\mathbf{x}) \cdot (\sum \mathbf{y})>0)$ and $P(\mathbf{x} \cdot \mathbf{y} \cdot (\sum\mathbf{x}) \cdot (\sum \mathbf{y})<0)$, where $\mathbf{x} \cdot \mathbf{y}$ is the inner product between $\mathbf{x}$ and $\mathbf{y}$, and $\sum\mathbf{x}$, $\sum\mathbf{y}$ are the sum of all elements in $\mathbf{x}$ and $\mathbf{y}$, respectively.

I have derived the explicit form for $P(\mathbf{x} \cdot \mathbf{y} =k, \sum\mathbf{x}=x, \sum \mathbf{y}=y)$, which is \begin{align} P(\mathbf{x} \cdot \mathbf{y}=k, \sum\mathbf{x} = x, \sum \mathbf{y} = y) =\frac{{N \choose (N + x)/2} \cdot{ (N+x)/2 \choose (N+x+y+k)/4 } \cdot {(N-x)/2 \choose (N+k-x-y)/4}}{2^{2N}} \end{align}

From the above formula, I computed $P(\mathbf{x} \cdot \mathbf{y} \cdot (\sum\mathbf{x}) \cdot (\sum \mathbf{y})>0)$ and $P(\mathbf{x} \cdot \mathbf{y} \cdot (\sum\mathbf{x}) \cdot (\sum \mathbf{y})<0)$ from nested sum over all possible $x$, $y$ and $k$ values from $n=2$ to $n=400$, and the results suggest that $P(\mathbf{x} \cdot \mathbf{y} \cdot (\sum\mathbf{x}) \cdot (\sum \mathbf{y})>0)$ and $P(\mathbf{x} \cdot \mathbf{y} \cdot (\sum\mathbf{x}) \cdot (\sum \mathbf{y})<0)$ are both converging to $0.5$, see the plot below: enter image description here

However, I have hard time trying to derive the large $N$ limit in a formal way.

Another side question is that, is above $P(\mathbf{x} \cdot \mathbf{y}=k, \sum\mathbf{x} = x, \sum \mathbf{y} = y)$ an known probability distribution?

1

There are 1 best solutions below

3
On

Edited

Let $a^T=[1, 1, ..., 1]$, that is an all-one vector of length $N$. Then you can rewrite the random variable as:

$Z=(Y^T X)(X^T a)(a^T Y)$,

in which $T$ represent transpose, so $Y^TX$ is the inner product of $X$ and $Y$, $X^T a$ is the sum of the elements of $X$, and $a^T Y$ is the sum of the elements of $Y$. The reason I wrote it this way will be clear in a few moments!

You are interested in large $N$, so I suggest investigating the mean of $Z$ first, as follows.

$\mu = E\{Z\} = E\{Y^T X X^T a a^T Y\}$

Since $Y^T X X^T a a^T Y$ is a scalar, it equals to its trace. I will use the fact that $tr(AB)=tr(BA)$, also the fact that trace and expectation are both linear operators and interchangeable:

$\mu = E\{Y^T X X^T a a^T Y\} = tr(E\{Y^T X X^T a a^T Y\}) = E\{tr(Y^T X X^T a a^T Y)\} = E\{tr(Y Y^T X X^T a a^T)\} $

where I took $A=Y^T X X^T a a^T$ and $B=Y$ in $tr(AB)=tr(BA)$. So,

$\mu = E\{tr(Y Y^T X X^T a a^T)\} = tr(E\{Y Y^T X X^T a a^T\}) = tr(E\{Y Y^T\} E\{X X^T\} a a^T\})$.

Since $E\{Y Y^T\}=E\{X X^T\} = I_N$, where $I_N$ is the identity matrix of size $N$, and also $a a^T=1_N$ is an all-one matrix of size $N$:

$\mu = tr(I_N I_N 1_N) = N$.

The expected value of your random variable diverges with $N$. I suspect that this might change your question.