Does the system $\sum_{k=0}^n \binom{n}{k} p^k(1-p)^{n-k}x_k=a, \sum_{k=0}^n \binom{n}{k}q^k(1-q)^{n-k}x_k=b$ has solutions in the cube $[0,1]^{n+1}$?

107 Views Asked by At

The following question arises in trying to solve a problem coming from information theory.

Suppose $0\le p < q \le1$ and $a,b\in(0,1)$. Is it true that there exists (sufficiently large) $n \in \mathbb{N}$ for which there exists a point $(x_0,...,x_n)$ in the unitary $(n+1)$-dimensional cube $[0,1]^{n+1}$ (i.e., $\forall k \in \{0,\dots,n\}, 0\le x_k \le 1$) satisfying \begin{equation} \begin{cases} \sum_{k=0}^n \binom{n}{k} p^k(1-p)^{n-k}x_k=a\\ \sum_{k=0}^n \binom{n}{k} q^k(1-q)^{n-k}x_k=b\\ \end{cases} \qquad ? \end{equation} In such a case, can we give a (good) upper bound (depending on $p,q, a,b$) on the smallest dimension $n$ for which the problem has a solution?

Notice that the coefficients of this linear system are given by the (discrete) densities of two binomial distributions, the first of which has parameters $(n,p)$, while the second has parameters $(n,q)$.

I've worked out the particular case where $q = p+\epsilon$ and $b = a+ \delta$, with $p = a$ and $0<\delta<\varepsilon$. In this case, a direct calculation shows that $n=1$ suffices.

For the general case, it is somewhat intuitive that the closer $p$ and $q$ are, the more complex the problem is (i.e., we need a big dimension $n$ for the problem to have a solution), while the closer $a$ and $b$ are, the easier the problem is (i.e., a small dimension $n$ would suffice for the problem to have a solution). Interestingly, experimenting on Mathematica, it seems that's always the case that the problem has a solution for sufficiently large $n$, and more precisely, that there exists $n_0 = n_0(p,q,a,b) \in \mathbb{N}$ such that for all $n \ge n_0$ the problem has a solution while if $n < n_0$ the problem doesn't have any (so, were this the case, it would also be interesting to give an upper bound on this $n_0(p,q,a,b)$).

However, I'm missing the bigger picture of what's the (geometrical) reason why this is happening in the general case.

Any help?

2

There are 2 best solutions below

0
On BEST ANSWER

The answer is yes. Let $\delta=\frac{q-p}{2}$. The weak law of large numbers tells us that as $n\to\infty$, we have $$P_n:=\sum_{(p-\delta)n< k< (p+\delta)n}\binom{n}{k} p^k(1-p)^{n-k}\to 1,$$ and similarly for $q$. (Since this is the probability that the average of $n$ i.i.d. $p$-Bernoulli random variables deviates from $p$ by less than $\delta$). Now let's look for a solution in the form $$ x_k=\cases{\hat{a},& $(p-\delta)n< k< (p+\delta)n$\\ \hat{b},&$(q-\delta)n< k< (p+\delta)n$\\ 0,&else.} $$ Our system then becomes $$ \cases{\hat{a}P_n+\epsilon_n\hat{b}=a\\ \hat{a}\epsilon'_n+Q_n\hat{b}=b,} $$ where $P_n,Q_n\to 1$ and $\epsilon_n\leq 1-P_n\to 0$, $\epsilon'_n\leq 1-Q_n\to 0$. So, the matrix of this system converges to the identity matrix, which means that it is eventually invertible and the inverse converges to the identity matrix. So, the solutions $(\hat{a},\hat{b})$ eventually exist and converge to $(a,b)$.

You can quantify everything here, e.g., the convergence in the WLNN is exponentially fast by Chernoff's bound: $$ 1-P_n\leq 2\exp(-n\gamma_p(\delta)), $$ where the rate function $\gamma(\delta)$ is the Legendre transform of $$\theta\mapsto \log\mathbb{E}e^{\theta (X-p)}=\log(1-p+pe^{\theta})-\theta p,$$ where $X$ is a $p$-Bernoulli random variable. In any case, $\gamma_p$ is a convex function which vanishes at $0$ and has a zero derivative there, and satisfies $\gamma(\delta)\geq c_p\delta^2$. So, for fixed $a,b$ and $p$, the dimension $n=C(q-p)^{-2}$ is sufficient, with an effective constant $C$ that may depend on $a,b,p$ and blow up at the edges cases.

4
On

Some thoughts.

Isn't it a question of shortest distances ?

Let us recall that for a hyperplane $H$ in $\mathbb{R^{n+1}}$ with equation

$$a_0x_0+a_1x_1+\cdots a_nx_n-c=0,$$

the (shortest) distance from a point $$P_0=(x_0^{(0)},x_1^{(0)},\cdots ,x_n^{(0)})$$ to hyperplane $H$ is given by formula

$$\frac{ |a_0x_0^{(0)}+a_1x_1^{(0)}+\cdots +a_nx_n^{(0)}-c|}{\sqrt{a_0^2+a_1^2+...+a_n^2}}\tag{1}$$

(see here for a proof)

For the particular case where $P_0=O$ (the origin), this formula becomes plainly:

$$d(O,H)=\frac{ |c|}{\sqrt{a_0^2+a_1^2+...+a_n^2}}\tag{2}$$

Due to the fact that the farthest point from the origin is vertex $\underbrace{(1,1,...1)}_{n+1 \ \text{times}}$ situated at distance $\sqrt{n+1}$, an intuitive (not rigorous) condition of intersection with the unit hypercube is :

$$d(O,H_1) \le \sqrt{n+1} \ \text{and} \ d(O,H_2) \le \sqrt{n+1}\tag{3}$$

where $H_1$ and $H_2$ are your two hyperplane equations.

"Intuition" behind condition (3): In the case of rather similar hyperplanes (with similar distances to the origin and $p \approx q$), if there exists a point $M_1 \in H_1$ such that $OM_1<d$ and a point $M_2 \in H_2$ such that $OM_2<d$ ; as points $M_1$ and $M_2$ are not far away, there "should" exist a point $M \in H_1 \cap H_2$ ($(n-1)$-dimensional affine subspace) not too far from $M_1$ and $M_2$ still complying with $OM<d$.

Does this "intuitive" condition gives an $n$ which isn't too far from the optimal $n$ found with your simulations, for different values of the four parameters ?

Edit: In the denominator of (1), one finds the square root of a quantity which is meaningful in the framework of information theory, i.e., "information potential". See for example the recent article here page 8. One can get a closed form expression of this information potential in terms of Legendre polynomials which is (we assume $p<1/2$):

$$\sum_{k=0}^n \left(\binom{k}{n}p^k(1-p)^{n-k}\right)^2=(1-2p)^n P_n(1+\tfrac{2p^2}{1-2p}) \ $$

(proof here).