Joint distribution of the signs of the partial sums of independent standard normal random variables

723 Views Asked by At

Consider some i.i.d. standard normal random variables. What is the joint distribution of the signs of their partial sums?

More formally, define a sequence of random variables $(S_k)_{k\geqslant1}$ by $S_1=X_1$ and $S_k=S_{k-1}+X_k$ for every $k\geqslant2$, where $(X_k)_{k\geqslant1}$ is i.i.d. standard normal. One asks to compute $$p(\varepsilon)=P(A^\varepsilon),$$ for every $n$ and every (deterministic) $\varepsilon=(\varepsilon_k)_{1\leqslant k\leqslant n}$ in $\{-,+\}^n$, where $A^\varepsilon$ is the event $$A^\varepsilon=\bigcap_{k=1}^n[\varepsilon_kS_k\geqslant0].$$ Some known facts (the four first items being obvious):

  • For $n=1$, $p(+)=p(-)=\frac12$.
  • For every $n$, the sum of $p(\varepsilon)$ over every $\varepsilon$ in $\{-,+\}^n$ is $1$.
  • For every $\varepsilon$, $p(\varepsilon)=p(-\varepsilon)$.
  • For every $\varepsilon$, $p(\varepsilon)=p(\varepsilon,+)+p(\varepsilon,-)$.
  • For $n=2$, $p(+,+)=p(-,-)=\frac38$ and $p(+,-)=p(-,+)=\frac18$.
  • For every $\eta$ in $\{-,+\}$ and $\varepsilon$ in $\{-,+\}^n$, $p(\varepsilon,\eta,-\eta)\lt\tfrac12p(\varepsilon,\eta)\lt p(\varepsilon,\eta,\eta).$
  • For each given $n$, $p(\ )$ is maximal on $\{-,+\}^n$ at the two constant sequences $\varepsilon=(+,+,\ldots,+)$ and $\varepsilon=(-,-,\ldots,-)$.

Note: This reformulates and generalizes a previous question, which probably meant to ask for $p(+,+,+).$ The facts recalled above yield $\tfrac3{16}\lt p(+,+,+)\lt\tfrac38.$

Sub-question: Compute $p(+,+,+).$

2

There are 2 best solutions below

1
On BEST ANSWER

The entire "normal distributions" aspect of this problem is a red herring. If $X_1,\ldots,X_n$ are i.i.d. normally distributed random variables with mean $0$, then the joint distribution of $(X_1,\ldots,X_n)$ is a multivariate normal distribution centered at the origin, which has complete spherical symmetry.

Now, each sequence $(\epsilon_1,\ldots,\epsilon_n)$ of signs determines a convex cone $C(\epsilon_1,\ldots,\epsilon_n)$ in $\mathbb{R}^n$ defined by $$ \epsilon_1 x_1 \geq 0,\quad \epsilon_2(x_1+x_2) \geq 0,\quad\ldots,\quad \epsilon_n(x_1+\cdots + x_n) \geq 0. $$ Then $$ p(\epsilon_1,\ldots,\epsilon_n) \;=\; \frac{\mu\bigl(S^{n-1} \cap C(\epsilon_1,\ldots,\epsilon_n)\bigr)}{\mu(S^{n-1})} $$ where $S^{n-1}$ is the unit sphere in $\mathbb{R}^n$, and $\mu$ is the volume measure on $S^{n-1}$. For example, $p({+}{+}{+}\cdots{+})$ is the probability that the partial sums of the coordinates are all positive for a random point chosen on the unit $(n-1)$-sphere.

For $n=3$, the region on $S^2$ corresponding to $+++$ is a spherical triangle with angles $\dfrac{3\pi}4$, $\cos^{-1}\left(-\dfrac{1}{\sqrt{3}}\right)$, and $\cos^{-1}\left(-\sqrt{\dfrac{2}{3}}\right)$. (These vertex angles are the same as the angles between the planes, which can be computed via dot products.) The area of this triangle is the angular excess: $$ \frac{3\pi}{4} + \cos^{-1}\left(-\dfrac{1}{\sqrt{3}}\right) + \cos^{-1}\left(-\sqrt{\dfrac{2}{3}}\right) - \pi \;=\; \frac{5\pi}{4}. $$ The whole sphere has area $4\pi$, so $$ p(+++) \;=\; \frac{5\pi/4}{4\pi} \;=\; \frac{5}{16}. $$ Similar computations show that $$ p(++-) = \frac{1}{16},\quad p(+-+) = -\frac{1}{16} + \frac{\tan^{-1}(2\sqrt{2})}{4\pi}\approx 0.0355,$$ and $$p(+--) = \dfrac{3}{16}-\dfrac{\tan^{-1}(2\sqrt{2})}{4\pi} \approx 0.0895. $$

For $n=4$, the regions on the sphere $S^3$ are spherical tetrahedra, where the dihedral angles between the faces are the same as the angles between the defining planes. For example $p({+}{+}{+}{+})$ is the volume of a spherical tetrahedron with the following dihedral angles, divided by the volume of $S^3$: $$ \theta_{12} = \frac{3\pi}{4},\ \theta_{13} = \cos^{-1}\left(-\frac{1}{\sqrt{3}}\right),\ \theta_{14}= \frac{2\pi}{3},\ \theta_{23}=\cos^{-1}\left(-\sqrt{\frac{2}{3}}\right),\ \theta_{24}=\frac{3\pi}{4},\ \theta_{34}=\frac{5\pi}{6}. $$ Based on Mathematica computations (using the formula in this paper) it seems that $$p({+}{+}{+}{+}) = \dfrac{35}{128}\qquad\text{and}\qquad p({+}{+}{+}{-})= \dfrac{5}{128},$$ which agrees with d.k.o.'s formula in these cases.

4
On

Denote $\mathbf Z_n=(Z_1,Z_2,...,Z_n)'$ then

$$\mathbf S_n=A\mathbf X_n$$

where $A$ is the lower triangular part of $\mathbf 1_n\mathbf 1_n'$. So, by definition $\mathbf S_n$ is jointly normal with zero mean and covariance matrix $\Sigma=AA'$ s.t. $\Sigma_{i,j}=i\wedge j$. Consequently,

$$\widetilde{\mathbf S}_n\equiv diag[\epsilon]\times\mathbf S_n \sim\mathcal N(0,\widetilde{\Sigma})$$

where $\widetilde{\Sigma}_{i,j}=(i\wedge j)\cdot (\epsilon_i\epsilon_j)$.

Finally, $p(\epsilon)=P\{\widetilde{\mathbf S}_n\ge \mathbf 0\}=P\{\widetilde{\mathbf S}_{n,1}\ge 0,...,\widetilde{\mathbf S}_{n,n}\ge 0\}$.


Edit: some simulations.

The computation of the multivariate normal integral is pretty difficult as @Did mentioned in the comments. However, simulations reveal an interesting "result"...

Here is the graph of the simulated conditional probability $p_c[n]\equiv\hat{P}\{\widetilde{\mathbf S}_{n,n}\ge \mathbf 0|\widetilde{\mathbf S}_{n-1}\ge \mathbf 0\}$ for $n\le 100$ and $\epsilon=\{+,...,+\}$ (blue circles):

enter image description here

The fitted curve is $\hat{p}_c[n]\approx 1-\frac{1}{2n}$ So, assuming that this relation holds we can find a formula for $p_u[n]\equiv$$ P\{\widetilde{\mathbf S}_n\ge \mathbf 0\}$ by solving the following recurrence relation

$$p_u[n]=p_u[n-1]\cdot \Big(1-\frac{1}{2n}\Big)\text{, }p_u[1]=\frac{1}{2}$$

which yields

$$p_u[n]=\frac{1}{\sqrt{\pi}}\frac{\Gamma(n+0.5)}{\Gamma(n+1)}=\frac{(2n)!}{4^n(n!)^2}$$