Not sure where to ask this question, but seems like it is a mathematics confusion.
So I am trying produce 1 dimension pdf of normal distribution in c++. The equation I used:
$$f(x) = \frac{1}{\sqrt{2\pi\sigma ^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$
If I want a $N(0, 1)$ I simply create 1000 samples of $x\in [\mu - 5\times\sigma, \mu + 5\times\sigma]$, which is $[-5,5]$. Then for each x is put into the equation above to generate the correspond value. But when I sum up all the values, it gives 62.xxx, and the values depends on how many samples I use in the interval, the more samples the larger the value.
Theoretically, I thought if we integrate from -infinite to +infinite, it should be 1? I known the interval I use does not resemble -infinite to +infinite, but since $[-3\sigma, 3\sigma]$ covers around 96%, then $5\sigma$ should covers like 99+ %, so shouldn't I get something close to 1 instead?
I further wanted to have Guassian Mixture Model by doing: $$\sum^{N}_{k=1}w_kN_k(x, \mu_k, \sigma^2_k)\qquad \sum^{N}_{k=1}w_k = 1$$ and it also does not give sum of 1. What I did wrong?
I also tried using pre-existing function in MatLab like:
x = -5:0.01:5;
y = pdf('Normal', x, 0, 1);
sum(y)
ans = 99.9999
x = -5:0.001:5;
y = pdf('Normal', x, 0, 1);
sum(y)
ans = 999.9994
Thanks of answering in advance.
Your question is not very clear. If you want to integrate $f(x)$, i.e. compute the area under its curve using samples, then you need to divide by the number of samples you used and multiply by the length of the support of the uniform density you are sampling from.
In your code you can do instead the following
x = -5:0.01:5; y = pdf('Normal', x, 0, 1); sum(y)/length(x) * (5-(-5)) ans = 0.9999x = -5:0.001:5; y = pdf('Normal', x, 0, 1); sum(y)/length(x)*(5-(-5)) ans = 0.9999x = -500:0.001:500; y = pdf('Normal', x, 0, 1); sum(y)/length(x)*(500-(-500)) ans = 1.000Note that this approximates the area under the curve and between -5 and 5 that is the value $$ \int_{-5}^5 f(x) dx $$ or in the last case between $-500$ to $500$.