Integrated average value of a multivariate function doesn't match average obtained through simulation.

850 Views Asked by At

So, recently I have been trying to calculate the expected area of a convex cyclic quadrilateral with perimeter $1$. Nonetheless, I will only post what's relevant to the issue - the fact that the calculated average of a function does not match the value generated by numerous simulations. I will give the details below:

The function we used for the area is: $$\mathfrak{B}(x,y,z)=\sqrt{(0.5-x)(0.5-y+x)(0.5-z+y)(z-0.5)}$$ Where variables $x$, $y$, and $z$ are chosen uniformly, and have to satisfy the following set of inequalities:

\begin{align} x+(y-x)+(z-y)>&(1-z)& \Rightarrow& & z &>0.5\\ x+(y-x)+(1-z)>&(z-y)& \Rightarrow& & y+0.5 &> z\\ x+(z-y)+(1-z)>&(y-x)& \Rightarrow& & x+0.5 &> y\\ (y-x)+(z-y)+(1-z)>&x& \Rightarrow& & 0.5 &> x \end{align} As well as: \begin{align} x&>0\\ z&<1 \end{align} These produce the following region in $\mathbb{R}^3$:

Image

Using the definition of the average value of a multivariate function: $$Average=\dfrac{\displaystyle\iiint_{V_1}\mathfrak{B}(x,y,z)\ \mathrm{d}x\mathrm{d}y\mathrm{d}z}{\displaystyle\iiint_{V_1}\mathrm{d}x\mathrm{d}y\mathrm{d}z}$$ Where $V_1$ is the above region. By parametrizing it we can see that: $$Average=\dfrac{\displaystyle\int_{0.5}^1\int_{0.5}^1\int_{-0.5+y}^{0.5}\mathfrak{B}(x,y,z)\ \mathrm{d}x\mathrm{d}y\mathrm{d}z+\int_{0.5}^1\int_{-0.5+z}^{0.5}\int_{0.5}^{1}\mathfrak{B}(x,y,z)\ \mathrm{d}x\mathrm{d}y\mathrm{d}z}{\displaystyle\int_{0.5}^1\int_{0.5}^1\int_{-0.5+y}^{0.5}\mathrm{d}x\mathrm{d}y\mathrm{d}z+\int_{0.5}^1\int_{-0.5+z}^{0.5}\int_{0.5}^{1}\mathrm{d}x\mathrm{d}y\mathrm{d}z}$$ By using NIntegrate[] in Mathematica we can find the value of each of these integrals, which gives: $Average=0.033909254...$ (correct to at least 20 digits)

However, I also wrote a program in Java which assigned a random value between $0$ and $0.99999...$ to three variables (simply Math.random() for those who are familiar with it), assigning the smallest one to $x$, the median to $y$, and the largest to $z$ (checking beforehand if they are not equal of course). Than, if the variables satisfied the inequalities, it would plug the values into $\mathfrak{B}(x,y,z)$, and add it to an array, and lastly take the mean of all the values of the function in that array. The answer obtained with a million trials ($499619$ of which satisfied the inequalities) was: $$Average=0.0400603...$$ I repeated the simulation several times, the first four decimal places were always matching. Can anyone explain why I get different expected value of a function through integration, than through simulation?

1

There are 1 best solutions below

1
On BEST ANSWER

Your simulation does not cover the entire region of integration. Because you always assign the smallest of your three values to $x$, you eliminate the entire portion of the region of integration in which $x > y$. Because you always assign the largest value to $z$, you eliminate the entire portion of the region of integration in which $y > z$.

Try plotting these regions: first plot the region exactly the same as your region of integration, except that we strengthen $x+0.5>y$ to $x>y$; that is, the region where \begin{align} z &>0.5,\\ y+0.5 &> z,\\ x &> y,\\ 0.5 &> x. \end{align} Then plot the region where \begin{align} z &>0.5,\\ y &> z,\\ x+0.5 &> y,\\ 0.5 &> x. \end{align} These are the two regions that your simulation leaves out.

To do better, you could simply assign $x$ uniformly in $[0,0.5)$, $y$ uniformly in $[0,1)$, and $z$ uniformly in $[0.5,1)$, then throw away the values that do not satisfy all of your inequalities. (This will throw away about half the triples you generate.)

Alternatively, rather than throwing away so many of your generated triples, you could assign $x$ uniformly in $[0,0.5)$, $y$ uniformly in $[0,0.5)$, and $z$ uniformly in $[0.5,1)$; then, in the case where $z \geq y+0.5$, transform the three variables this way: $$ \begin{pmatrix} x \\ y \\ z \end{pmatrix} \to \begin{pmatrix} z-0.5 \\ y+0.5 \\ x+0.5 \end{pmatrix}. $$ You would still throw away some results that violated your strict inequalities, for example $x > 0$, but if I did the transformation correctly, it takes the region $0<x<0.5, 0\leq y<0.5, y+0.5<z<1$, which you otherwise would have to discard, and uses it to fill in the region $y-0.5<x<0.5, 0.5\leq y<1, 0.5<z<1$. But it is simpler (and less error-prone) to just generate twice as many data points as you will actaully use.