Sum of two Gaussian PDFs

5.6k Views Asked by At

I want to find the sum of two univariate Gaussians with different variances.
Considering that the sum of two independent normal variables $X_1$ and $X_2$ gives$$X_1+X_2\sim \operatorname{N}\left (\mu _1+\mu _2, \sigma _1^2+\sigma _2^2\right ),$$then I would expect that the sum of their PDFs is another Gaussian PDF. That is $G_{\sigma _1^2+\sigma _2^2}(x)$. Is that the case? What happens if they are independent?

2

There are 2 best solutions below

0
On

PDFs must integrate 1, so the sum of two PDFs integrates 2 (it cannot be a PDF). Your question is linked to random variable mixtures, where you define a random variable whose PDF is the convex combination of the original PDFs. For example the average of two normals is a mixture of normals (two bells).

I recommend you read: https://en.wikipedia.org/wiki/Mixture_distribution#Two_normal_distributions

1
On

The p.d.f. $f_Z$ of the sum of 2 normal random variables $Z=X+Y$ is computed by convolving $f_X$ with $f_Y$, or $f_Z(z)=\int_{-\infty}^{\infty}f_X(x)f_Y(z-x)dx$. It is not computed by taking the average of $f_X$ with $f_Y$.

As Matías mentioned, the average of the 2 Gaussian p.d.f.'s is a mixture model, see here. To interpret $Z$, consider flipping a fair coin and set $Z=X$ if the coin lands on head and set $Z=Y$ if it lands on tail. In this way, $Z$ is $X$ half the time and is $Y$ half the time, so $f_Z$ is the the average of $f_X$ with $f_Y$. Therefore, $F_Z$ is the the average of $F_X$ with $F_Y$. See charts below.

Averaging N(0,1) and N(0,4).

We can generalize by assuming $Z$ takes $X$ with probability $d$ and $Y$ with probability $1-d$. We can change the means and variances of $X$ and $Y$ to make them sufficiently far away and exhibit bimodality.

Mixing X=N(-2,1) and Y=N(2,4) with d=0.3.

Code snippet:

import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
sns.set_context("paper")
sns.set_style("darkgrid")
plt.rcParams['patch.edgecolor'] = 'none'

def normalPdf(x, mean, variance):
    return (1/np.sqrt(2*np.pi*variance))*np.exp(-0.5*(x-mean)**2/variance)

def normalCdf(x, mean, variance):
    pdf = normalPdf(x, mean, variance)
    return np.cumsum(pdf)/np.sum(pdf)

# gaussian specification
mean1, variance1 = -2, 1
mean2, variance2 = 2, 4
d = 0.3

# sample data points
N = 100000
x = np.linspace(-7.5, 7.5, 100)
X = mean1 + np.sqrt(variance1) * np.random.randn(N)
Y = mean2 + np.sqrt(variance2) * np.random.randn(N)
Z = np.random.rand(N) < d
Z = Z*X + (1-Z)*Y

# plot
fg, ax = plt.subplots(2, 1, sharex=True, figsize=(6, 8), dpi=80)
ax[0].hist(Z, bins=int(N/100), density=True, alpha=0.3, label="empirical")
ax[0].plot(x, normalPdf(x, mean1, variance1), label="$\hat{f_X}(x)$")
ax[0].plot(x, normalPdf(x, mean2, variance2), label="$\hat{f_Y}(x)$")
ax[0].plot(x, d*normalPdf(x, mean1, variance1)+(1-d)*normalPdf(x, mean2, variance2), label="$\hat{f_Z}(x)$", ls="-.")
ax[0].legend()
ax[0].set_ylabel("Probability Density Function")
ax[1].hist(Z, bins=int(N/100), density=True, alpha=0.3, label="empirical", cumulative=True)
ax[1].plot(x, normalCdf(x, mean1, variance1), label="$\hat{f_X}(x)$")
ax[1].plot(x, normalCdf(x, mean2, variance2), label="$\hat{f_Y}(x)$")
ax[1].plot(x, d*normalCdf(x, mean1, variance1)+(1-d)*normalCdf(x, mean2, variance2), label="$\hat{f_Z}(x)$", ls="-.")
ax[1].legend()
ax[1].set_ylabel("Cumulative Density Function")
plt.show()