Mean and Variance of "Piecewise" Normal Distribution

Question

Mean and Variance of "Piecewise" Normal Distribution

2.7k Views Asked by Bumbble Comm At 03 Apr 2026 - 2:07

Note - I put piecewise in quotes because I don't think it's the right term to use (I can't figure out what to call it).

I am building a program to model the load that a user places on a server. The load a user produces follows a normal distribution. Depending on which application the user is using, however, the mean and variance of that normal distribution will be different. What I am trying to do is calculate the overall mean and variance for a user's activity given the proportions of time they use each application.

For example, Application A follows $\mathcal{N}(100, 50)$ and Application B follows $\mathcal{N}(500, 20)$. If a user uses A 50% of the time and B the other 50%, what is the mean and variance of the data that the user would produce during a day?

I'm able to simulate this by selecting a number from a uniform distribution between 0 and 1 and then generating a value from the appropriate distribution. Something like this:

$f(x) = \begin{cases} \mathcal{N}(100, 50), &0 \le x \lt 0.5\\ \mathcal{N}(500, 20), &0.5 \le x \lt 1\\ \end{cases} $

When I simulate a large number of these values and measure the results, it looks like the mean is just

$\sum\limits_{i=1}^n\mu_ip$

where $p$ is the percentage of the day a user is using each application.

I can't figure out what pattern the variance follows or what the formula might be to determine it without measuring a bunch of simulated values (When I simulate the above example, the variance looks to be something close to 41500).

I'd appreciate confirmation that how I'm calculating the combined mean is correct and some help in figuring out how to determine the variance of the overall distribution.

Original Q&A

There are 3 best solutions below

Bumbble Comm On 05 Aug 2014 - 2:48

What you are wanting are the distributional quantities of the mixture distribution. There are some convenient formulas to do this: let $T$ be the unconditional load a user places on the system, and let $X$ be a Bernoulli random variable that indicates whether a user is on system $A$ or $B$. So $A = (T \mid X = 0) \sim \mathrm{Normal}(\mu_A = 100, \sigma_A^2 = 50)$, and $B = (T \mid X = 1) \sim \mathrm{Normal}(\mu_B = 500, \sigma_B^2 = 20)$. That is to say, $$\begin{align*} \mathrm{E}[T \mid X = 0] &= \mu_A, \\ \mathrm{E}[T \mid X = 1] &= \mu_B, \\ \mathrm{Var}[T \mid X = 0] &= \sigma_A^2, \\ \mathrm{Var}[T \mid X = 1] &= \sigma_B^2. \end{align*}$$

Then by the law of total expectation, $$\begin{align*} \mathrm{E}[T] &= \mathrm{E}[\mathrm{E}[T \mid X]] \\ &= \mathrm{E}[T \mid X = 0]\Pr[X = 0] + \mathrm{E}[T \mid X = 1]\Pr[X = 1] \\ &= \mu_A (1-p) + \mu_B p,\end{align*}$$ where $p$ is the probability that a user is on system $B$. The variance is calculated by $$\begin{align*} \mathrm{Var}[T] &= \mathrm{E}[\mathrm{Var}[T \mid X]] + \mathrm{Var}[\mathrm{E}[T \mid X]] \\ &= \mathrm{Var}[T \mid X = 0]\Pr[X = 0] + \mathrm{Var}[T \mid X = 1]\Pr[X = 1] + \mathrm{Var}[\mathrm{E}[T \mid X]] \\ &= \sigma_A^2 (1-p) \sigma_B^2 p + \mathrm{Var}[\mathrm{E}[T \mid X]]. \end{align*}$$ The last term requires a little subtlety to understand. The variable $\mathrm{E}[T \mid X]$ is a generalized Bernoulli, which takes on the value $\mu_A$ with probability $1-p$ and $\mu_B$ with probability $p$ (rather than $0$ and $1$). So we may write this as $$\mathrm{E}[T \mid X] = X(\mu_B - \mu_A) + \mu_A,$$ where $X \sim \mathrm{Bernoulli}(p)$. Therefore, $$\mathrm{Var}[\mathrm{E}[T \mid X]] = \mathrm{Var}[(\mu_B - \mu_A)X + \mu_A] = (\mu_B - \mu_A)^2 \mathrm{Var}[X] = (\mu_B - \mu_A)^2 p(1-p).$$ Hence $$\mathrm{Var}[T] = \sigma_A^2 (1-p) + \sigma_B^2 p + (\mu_B - \mu_A)^2 p(1-p).$$ The general case with $n$ systems $S_i \sim \mathrm{Normal}(\mu_i, \sigma_i^2)$, $i = 1, 2, \ldots, n$, where the user is on system $i$ with a probability of $p_i$, with $\sum_{i=1}^n p_i = 1$, involves a categorical distribution rather than a Bernoulli.

Bumbble Comm On 05 Aug 2014 - 2:49

Recall that $V[X]=E[X^2]-E[X]^2$. Hence if you are given $E$ and $V$ of $X_1,X_2$ and combine them as described to a new random variable $Y$ by using a 0-1-random variable $Z$, i.e. picking $X_1$ if $Z=1$ and picking $X_2$ if $Z=0$, we find $$ E[Y]=P(Z=1)\cdot E[Y|Z=1]+P(Z=0)\cdot E[Y|Z=0]=pE[X_1]+(1-p)E[X_2].$$ By the same argument we find $$ E[Y^2] = pE[X_1^2]+(1-p)E[X_2^2]$$ Substituting $E[X_i^2]=V[X_i]+E[X_i]^2$, we obtain $$ \begin{align}V[Y]&=E[Y^2]-E[Y]^2 \\&= p(V[X_1]+E[X_1]^2) + (1-p)(V[X_2]+E[X_2]^2)-(pE[X_1]+(1-p)E[X_2])^2\\ &=pV[X_1]+(1-p)V[X_2]+p(1-p)(E[X_1]^2+E[X_2]^2)-2p(1-p)E[X_1]E[X_2]\\ &=pV[X_1]+(1-p)V[X_2]+p(1-p)(E[X_1]-E[X_2])^2.\end{align}$$

**Bumbble Comm** · Accepted Answer

Let the two normal random variables be $X$ and $Y$, where $X$ is chosen with probability $p$, and $Y$ is chosen with probability $q=1-p$.

If $W$ is the resulting random variable, then $\Pr(W\le w)=p\Pr(X\le w)+q\Pr(Y\le w)$.

Differentiate. We get $f_W(w)=pf_X(w)+qf_Y(w)$.

The mean of $W$ is $\int_{-\infty}^\infty wf_W(w)$. Calculate. We get $$\int_{-\infty}^\infty w(pf_X(w)+qf_Y(w))\,dw.$$ This is $pE(X)+qE(Y)$, confirming your observation.

For the variance, we want $E(W^2)-(E(W))^2$. For $E(W^2)$, we calculate $\int_{-\infty}^{\infty} w^2(pf_X(w)+qf_Y(w))\,dw$. This is $pE(X^2)+qE(Y^2)$.

But $pE(X^2)= p(\text{Var}(X)+(E(X))^2)$ and $qE(Y^2)= q(\text{Var}(Y)+(E(Y))^2)$

Putting things together we get $$\text{Var}(W)=p\text{Var}(X)+q\text{Var}(Y)+ p(E(X))^2+q(E(Y))^2- (pE(X)+qE(Y))^2.$$

Remark: For a longer discussion, please look for Mixture Distributions.

Mean and Variance of "Piecewise" Normal Distribution

There are 3 best solutions below

Related Questions in PROBABILITY

Related Questions in NORMAL-DISTRIBUTION

Trending Questions

Popular # Hahtags

Popular Questions