Issue deriving the probability distribution of the sample mean of a normally distributed variable

Question

Issue deriving the probability distribution of the sample mean of a normally distributed variable

124 Views Asked by Bumbble Comm At 29 Mar 2026 - 10:28

I'm trying to write a formula which describes the probability distribution of the sample mean of a normally distributed variable; I have a derivation, but something must be wrong because the integral of the distribution is greater than one. I think that the error is in my thought process and not just a calculation error, since the first term in the formula has an indefinite integral equal to one

Derivation:

To find the probability distribution of the sample mean, we begin by looking at the probability of getting a sample mean value equal to Y; i.e. out of all possible sample means what is the probability of getting a sample mean equal to Y. This should be equal to the probability of getting a sample mean equal to Y when you only sample 1 value, plus the probability that you get a sample mean equal to Y when you uniquely sample 2 values, and so on for uniquely sampling n values.

P(sample mean=Y)=$\sum_{i=1}^\infty P(\sum_{j=1}^i x_j=i*Y)$

$P(\sum_{j=1}^n x_j=n*Y)$ should be a sum over the probabilities associated with the individual unique combination of values which sums to n*Y. Letting p be the probability of getting a specific value from the original distribution, we have

P($\sum_{j=1}^n x_j$=n*Y)=$\sum p(x_1)*p(x_2)*(...)*(p(n*y-\sum_{l=1}^{n-1} x_l)) $

each $\ p(x_i)=\frac1{\sigma\sqrt{2*\pi}}*e^{-\frac{(x_i-u)^2}{\sigma^2*2}}dx_i$

with u being the mean and $\sigma$ is the standard deviation of the original distribution.

my notation for the next part is informal: so the sum approaches a multiple integral; if we let the bounds tend from $-\infty$ to $\infty$ then we have to account for the fact that we can shuffle the probabilities; before we divide by $n!$, we have to subtract the portions where more than one variable would be equal. For example when we form a sample mean from 3 values, the probability found by summing over all combinations could be "approximated" as

$(\frac1{\sigma\sqrt{2*\pi}})^3*e^{-\frac{(Y-u)^2}{\sigma^2*2}}*e^{-\frac{(Y-u)^2}{\sigma^2*2}}*e^{-\frac{(Y-u)^2}{\sigma^2*2}}dx^3+\binom{3}{2}\int_{-\infty}^\infty (\frac1{\sigma\sqrt{2*\pi}})^3*e^{-\frac{(x_1-u)^2}{\sigma^2*2}}*e^{-\frac{(x_1-u)^2}{\sigma^2*2}}*e^{-\frac{(3*Y-2*x_1-u)^2}{\sigma^2*2}}dx^3+\frac 1{3!}(\iint_{-\infty}^\infty (\frac1{\sigma\sqrt{2*\pi}})^3*e^{-\frac{(x_1-u)^2}{\sigma^2*2}}*e^{-\frac{(x_2-u)^2}{\sigma^2*2}}*e^{-\frac{(3*Y-x_1-x_2-u)^2}{\sigma^2*2}}dx^3-\binom{3}{2}\int_{-\infty}^\infty (\frac1{\sigma\sqrt{2*\pi}})^3*e^{-\frac{(x_1-u)^2}{\sigma^2*2}}*e^{-\frac{(x_1-u)^2}{\sigma^2*2}}*e^{-\frac{(3*Y-2*x_1-u)^2}{\sigma^2*2}}dx^3-(\frac1{\sigma\sqrt{2*\pi}})^3*e^{-\frac{(Y-u)^2}{\sigma^2*2}}*e^{-\frac{(Y-u)^2}{\sigma^2*2}}*e^{-\frac{(Y-u)^2}{\sigma^2*2}}dx^3)$

As the infinitesimals become smaller and smaller we approximate our solution; in order to get our distribution, we divide by one dx since p=dx*(distribution at Y). After doing so, all terms except the double integral converge to a number multiplied by a power of $dx$, so those terms tend to zero leaving us with

$\frac 1{3!}\iint_{-\infty}^\infty (\frac1{\sigma\sqrt{2*\pi}})^3*e^{-\frac{(x_1-u)^2}{\sigma^2*2}}*e^{-\frac{(x_2-u)^2}{\sigma^2*2}}*e^{-\frac{(3*Y-x_1-x_2-u)^2}{\sigma^2*2}}dx^2$

the same sort of thing happens for the other terms in the series, giving (labeling the probability distribution value at Y as D(Y))

$D(Y)=\sum_{n=1}^{\infty}\frac 1{n!}\int_{-\infty}^\infty...$n-1 times$...\int_{-\infty}^\infty(\frac1{\sigma\sqrt{2*\pi}})^n*e^{-\frac{(Y*n-u-\sum_{i=1}^{n-1} x_i)^{2}+\sum_{i=1}^{n-1}(x_i-u)^2}{\sigma^2*2}}dx^{n-1}$

In order to simplify each term, we first apply the transformation $x_i-u=z_i$ giving $dx^{n-1}=dz^{n-1}$ and then $z_i=\sigma*\sqrt{2}*q_i$ giving $dz^{n-1}={(\sigma\sqrt{2})}^{n-1}*dq^{n-1}$

$D(Y)=\sum_{n=1}^{\infty}\frac 1{n!}\int_{-\infty}^\infty...$n-1 times$...\int_{-\infty}^\infty({\frac1{\sqrt{\pi}}})^n*\frac1{\sigma\sqrt{2}}e^{-(\frac{Y*n-u*n}{\sigma*\sqrt{2}}-\sum_{i=1}^{n-1} q_i)^{2}-\sum_{i=1}^{n-1}(q_i)^{2}}dx^{n-1}$

the next steps depend on just playing with the argument of the exponential; it expands in to

$2*\sum q_i^{2}+2\sum_{i<j}q_i*q_j-2\frac{n*(y-u)}{\sigma*\sqrt{2}}*\sum q_i+(\frac {n(y-1)}{\sqrt{2}*\sigma})^2$

we can rewrite this as

$ Q^T\begin{bmatrix} 2 & 1 & 1 & \cdots & 1 \\ 1 & 2 & 1 & \cdots & 1 \\ 1 & 1 & 2 & \cdots & 1 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \cdots & \cdots & \cdots & 2 \\ \end{bmatrix}Q-2\frac{n*(y-u)}{\sigma*\sqrt{2}}*\sum q_i+(\frac {n(y-1)}{\sqrt{2}*\sigma})^2 $

the symmetric matrix, label it A, has an orthonormal eigen basis. So for $AP_n=P_n[\lambda]$, the orthonormal n-1 by n-1 $P_n$ is

$ P_n=\begin{bmatrix} \frac{1}{\sqrt{n-1}} & -\frac{1}{\sqrt{\frac{1}{1}+1}} & -\frac{1}{2}*\frac{1}{\sqrt{\frac{1}{2}+1}} & -\frac{1}{3}*\frac{1}{\sqrt{\frac{1}{3}+1}} & \cdots & -\frac{1}{n-2}*\frac{1}{\sqrt{\frac{1}{n-2}+1}} \\ \frac{1}{\sqrt{n-1}} & \frac{1}{\sqrt{\frac{1}{1}+1}} & -\frac{1}{2}*\frac{1}{\sqrt{\frac{1}{2}+1}} & -\frac{1}{3}*\frac{1}{\sqrt{\frac{1}{3}+1}} & \cdots & -\frac{1}{n-2}*\frac{1}{\sqrt{\frac{1}{n-2}+1}}\\ \frac{1}{\sqrt{n-1}} & 0 & \frac{1}{\sqrt{\frac{1}{2}+1}} & -\frac{1}{3}*\frac{1}{\sqrt{\frac{1}{3}+1}} & \cdots & -\frac{1}{n-2}*\frac{1}{\sqrt{\frac{1}{n-2}+1}} \\ \frac{1}{\sqrt{n-1}} & 0 & 0 & \frac{1}{\sqrt{\frac{1}{3}+1}} & \cdots & -\frac{1}{n-2}*\frac{1}{\sqrt{\frac{1}{n-2}+1}} \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{1}{\sqrt{n-1}} & 0 & \cdots & \cdots & \cdots & \frac{1}{\sqrt{\frac{1}{n-2}+1}}\\ \end{bmatrix}$

Where the first column has an eigen value n and every other has an eigenvalue 1

So we assign new variables $w^{->}=P_n^Tq^{->}$. Since the matrix is orthonormal, $det{P^T}=det{P}=1$

The argument transforms into

$ n*{w_1}^2+(\sum_{i=2}^{n-1} {w_i}^2)-2\frac{n*(y-u)}{\sigma*\sqrt{2}}*\sqrt{n-1}w_1+(\frac{n*(y-u)}{\sigma*\sqrt{2}})^2$

giving

$D(y)=\sum_{n=1}^{\infty}\frac 1{n!}\int_{-\infty}^\infty...$n-1 times$...\int_{-\infty}^\infty({\frac1{\sqrt{\pi}}})^n*\frac1{\sigma\sqrt{2}}e^{-(n*{w_1}^2+(\sum_{i=2}^{n-1} {w_i}^2)-2\frac{n*(y-u)}{\sigma*\sqrt{2}}\sqrt{n-1}w_1+(\frac{n*(y-u)}{\sigma*\sqrt{2}})^2)}dw^{n-1}$

Now all the variables are independent in the integrand, so the variables $w_2$ to $w_n-1$ can be integrated separately giving

$D(y)=\sum_{n=1}^{\infty}\frac 1{n!}\int_{-\infty}^\infty({\frac1{\sqrt{\pi}}})^2*\frac1{\sigma\sqrt{2}}e^{-(n*{w_1}^2-2\frac{n*(y-u)}{\sigma*\sqrt{2}}\sqrt{n-1}w_1+(\frac{n*(y-u)}{\sigma*\sqrt{2}})^2)}dw$

now we're just left with a single gaussian integral, giving

$D(y)=\sum_{n=1}^{\infty}\frac 1{n!}({\frac1{\sqrt{2n{\pi}}}})*\frac1{\sigma}e^{-((\frac{n*(y-u)}{\sigma*\sqrt{2}})^2-(\frac{n*(y-u)}{\sigma*\sqrt{2}})^2*\frac{n-1}{n})}$

which turns into

$D(y)=\sum_{n=1}^{\infty}\frac 1{n!}({\frac1{\sqrt{2n{\pi}}}})\frac1{\sigma}e^{-n(\frac{(y-u)}{\sigma*\sqrt{2}})^2}$

or, if we label $G(n,y,\sigma,u)={\frac1{\sqrt{2{\pi}}}}\frac{\sqrt n}{\sigma}e^{-n(\frac{(y-u)}{\sigma*\sqrt{2}})^2}$ , being the distribution for sample means when we sample n variables

$D(y)=\sum_{n=1}^{\infty}\frac 1{n!}*\frac{G(n,y,\sigma,u)}{n}$

Discussion:

This formula is wrong since the integral of the function converges to about 1.3179. The first term in the sum is just the distribution for the original variable, so it's integral is one. The integral of the remaining $D(y)=\sum_{n=2}^{\infty}\frac 1{n!}*\frac{G(n,y,\sigma,u)}{n}$ is then about .3179

I wouldn't be surprised if there were other issues in the thought process, but that's all I can see right now. Thank you for any help!

Original Q&A

There are 2 best solutions below

Bumbble Comm On 26 Mar 2018 - 10:14

To derive this, I suggest that you prove that if X is a normal random variable with mean $\mu$ and variance $\sigma^2$ and c is a constant then cX is normal with mean $c\mu$ and variance $c^2\sigma^2$ . Also, show if X and Y are independent normal distributions with mean $\mu_X$ and variance $\sigma^2_X$ and mean $\mu_Y$ and variance $\sigma^2_Y$, then X+Y is normal with mean $\mu_X+\mu_Y$ and variance $\sigma^2_X+\sigma^2_Y$. Together, these give you that a linear combination of independent normal random variables is normal. If you think of a simple random sample as observations from n iid normal random variables, then the sample mean from a normal distribution is a linear combination of independent normal random variable so is normal. If you want a formula for the distribution of the sample mean, it will be that of a normal with mean the same as the population and variance will be the population variance divided by n.

**Bumbble Comm** · Accepted Answer

This is a confusing setting and these are some confusing computations, partly due to the fact that you apply the formalism of discrete random variables to continuous ones.

In the end, it seems you are first considering $(X_k)$ i.i.d. normal $(\mu,\sigma^2)$, then, for every $n$, the random variable $Y_n=\frac1n\sum\limits_{k=1}^nX_k$, and finally $g=\sum\limits_{n=1}^\infty g_n$ where, for every $n$, $g_n$ denotes the PDF of $Y_n$.

There is no reason to expect that $g$ should be a PDF since it corresponds to no clear probabilistic experiment (recall that one cannot draw a positive integer uniformly at random). In fact $g$ is trivially not a PDF since $$\int_\mathbb Rg=\sum_n\int_\mathbb Rg_n=\sum_n1=+\infty$$ Furthermore, each $Y_n$ is normal $(\mu,\sigma^2/n)$ hence $$g(y)=\sum_n\frac1{\sqrt{2\pi\sigma^2}}\sqrt ne^{-n(y-\mu)^2/2\sigma^2}$$ that is, $$g(y)=\frac1{\sqrt{2\pi\sigma^2}}\sum_n\sqrt ne^{-nr(y)}\qquad r(y)=\frac{(y-\mu)^2}{2\sigma^2}$$ Thus, if $y=\mu$, then $r(y)=0$ and $g(y)$ is infinite while, if $y\ne\mu$, then $r(y)>0$ and $g(y)$ is the sum of a special function. A formula giving $g(y)$ as an integral rather than a series uses the fact that $$\sqrt n=\frac1{2\sqrt\pi}\int_0^\infty (1-e^{-nt})\frac{dt}{t\sqrt t}$$ to the effect that $$g(y)=\frac 1{2\pi\sqrt{2\sigma^2}}\frac1{e^{r(y)}-1}\int_0^\infty\frac{e^t-1}{e^{t}-e^{-r(y)}}\frac{dt}{t\sqrt t}$$

Issue deriving the probability distribution of the sample mean of a normally distributed variable

Derivation:

$D(y)=\sum_{n=1}^{\infty}\frac 1{n!}({\frac1{\sqrt{2n{\pi}}}})\frac1{\sigma}e^{-n(\frac{(y-u)}{\sigma*\sqrt{2}})^2}$

$D(y)=\sum_{n=1}^{\infty}\frac 1{n!}*\frac{G(n,y,\sigma,u)}{n}$

Discussion:

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in COMBINATORICS

Related Questions in PROOF-VERIFICATION

Related Questions in PROBABILITY-DISTRIBUTIONS

Trending Questions

Popular # Hahtags

Popular Questions