Where's my mistake in my attempt at showing that the squared sum of normally distributed variables is a $\chi^2$ distribution?

74 Views Asked by At

$\newcommand{\N}{\mathcal{N}}\newcommand{\d}{\,\mathrm{d}}$I am trying to understand how the sum of squares of standard normal variables is a $\chi^2$ distribution. Note that I am not a student of probability theory, but am rather trying to make sense of some things in a formal manner from a background of analysis.

As I understand it:$$X\sim\N(0,1)\iff p(X\in I\subset\Bbb R)=\frac{1}{\sqrt{2\pi}}\int_I\exp\left(-\frac{1}{2}x^2\right)\d x$$

Allegedly, if $X_1,X_2,\cdots,X_n\sim\N(0,1)$, then:

$$\sum_{i=1}^nX_i^2\sim\chi_n^2$$

Where:

$$X\sim\chi^2_n\iff p(X\in I\subset\Bbb R^+)=\frac{1}{2^{n/2}\Gamma(n/2)}\int_Ix^{n/2-1}\exp(-x/2)\d x$$

I will try (and fail!) to show this.

The squares of these $X_i$ will lie in $\Bbb R^+$. I will proceed by induction. For $n=1$, we find that $X_1^2\in I\iff X_1\in\pm\sqrt{I}$, so I can make the substitution $x=u^2$ to retrieve the normal pdf of $X_1$. I believe it is a theorem that if two measures agree on all intervals in $\Bbb R$, they agree on all Borel sets - is this true? What's the name of this theorem if it is? I cannot recall, but I think I read this somewhere. Anyway, w.l.o.g (hopefully!) take $I=(a^2,b^2)\subset\Bbb R^+$ : $$\begin{align}p(X_1^2\in I)&=\int_{a^2}^{b^2}\mathrm{pdf}_{X_1^2}(x)\d x=\frac{1}{2}\left(\int_a^b2x\cdot\mathrm{pdf}_{X_1}(x)\d x+\int_{-b}^{-a}2x\cdot\mathrm{pdf}_{X_1}(x)\d x\right)\\&=\frac{1}{\sqrt{2\pi}}\left(\int_a^b x\exp(-\frac{1}{2}x^2)\d x+\int_{-b}^{-a}x\exp(-\frac{1}{2}x^2)\d x\right)\\&=\frac{1}{\sqrt{2\pi}}\left(\int_{a^2}^{b^2}\sqrt{u}\cdot\exp(-\frac{1}{2}u)(\frac{1}{2}u^{-1/2}\d u)-\int_{b^2}^{a^2}\sqrt{u}\cdot\exp(-\frac{1}{2}u)(\frac{1}{2}u^{-1/2}\d u)\right)\\&=\frac{1}{\sqrt{2\pi}}\int_{a^2}^{b^2}\exp(-\frac{1}{2}u)\d u\end{align}$$ But this is not consistent with the assertion that $X_1^2\sim\chi^2_1$, since the integral is missing a $u^{-1/2}$ term.

I have two questions:

  1. What's my mistake in the analysis of the case $n=1$?
  2. To put my mind at rest, to show that the two distributions are the same over all Borel sets in $\Bbb R$, is it really sufficient to show equivalence over intervals?

Many thanks.

2

There are 2 best solutions below

0
On BEST ANSWER

In your second equality you don't need to do a change of variables. You end up doing another change of variables when switching to $u$, which is the only substitution you actually need to do. \begin{align} P(X_1^2 \in [a^2, b^2]) &= P(X_1 \in [a,b] \cup [-b, -a]) \\ &= \int_{[a,b] \cup [-b, -a]} f_{X_1}(x) \, dx \\ &= 2 \int_a^b f_{X_1}(x) \, dx & \text{symmetry} \\ &= 2 \int_{a^2}^{b^2} \frac{1}{2\sqrt{u}} f_{X_1}(\sqrt{u}) \, du & u=x^2 \\ &= \int_{a^2}^{b^2} \frac{1}{\sqrt{2\pi}} u^{-1/2} e^{-u/2} \, du. \end{align}


Regarding upgrading "agreement on integrals" to "agreement on Borel sets," see Carathéodory's Extension Theorem.

2
On

$$\begin{aligned}P(X^2_1\in[a,b])&=P(X_1\in [-\sqrt{b},-\sqrt{a}]\cup [\sqrt{a},\sqrt{b}])=\\ &=P(X_1\in[-\sqrt{b},-\sqrt{a}])+P(X_1\in [\sqrt{a},\sqrt{b}])=\\ &=2P(X_1\in [\sqrt{a},\sqrt{b}])=\\ &=2(\Phi(\sqrt{b})-\Phi(\sqrt{a}))\end{aligned}$$ At this point, find the cdf to obtain the pdf by differentiation. Now let $a\to -\infty$ we get $$F_{X_1^2}(b)=2\Phi(\sqrt{b})-1\implies f_{X_1^2}(b)=\frac{1}{\sqrt{b}}\phi(\sqrt{b})=\frac{1}{\sqrt{2\pi b}}e^{-\frac{b}{2}}$$ This is the $\chi^2$ pdf with $n=1$. As suggested by @Bey, now use CFs. The CF of a $Y_1\sim \chi^2$ is $$\begin{aligned}E[e^{i\xi Y_1}]=(1-2i\xi)^{-1/2} \end{aligned}$$ So if we have the sum $Y=\sum_{k\leq n}Y_k$ where $Y_k$ are IID $\chi^2$ with $n=1$ we get $$E[e^{i\xi Y}]=E[e^{i\xi Y_1}]^n=(1-2i\xi)^{-n/2}$$ this means that $Y \sim \chi^2_n$.