Comparing 2 Gaussian Distribution

201 Views Asked by At

I have 2 different dataset of about 1000 points each. Actually, the 2 are not so different generally. I want to compare the 2 data but my statically knowledge is quite poor. My idea is to construct GMM of both dataset separately and compare the difference using Euclidean distance between the 2 model. Than for each point if its less than the distance i divide them into one set and over the distance into another set. However before I begin my long journey into programming them I just thought to ask if my idea is correct? Please advice me of any other method that would do the work. Thanks

The example of dataset is as:

dataset 1: 4D points (x, y, z, and angle in degrees)
\begin{align} 0.981906 &&-0.187578 &&-0.00690318 &&-0.025056&& 91, \\ 0.981906 &&-0.187578&& -0.00690318 &&-0.025056&& 91, \\ 0.822506 &&0.564264 &&0.0497813 &&0.0511153&& 87, \\ -0.466879&& -0.334826&& 0.678983&& 0.457053 &&62, \\ 0.732004&& 0.648702 &&-0.17913&& 0.106151&& 83, \\ 0.563192&& 0.281079&& 0.500732&& 0.594202&& 53, \\ 0.997654&& 0.0671787&& -0.0102981 &&-0.00820781 &&90, \\ \end{align}

dataset 2: (Same format)
\begin{align} 0.996634&& -0.0723931 &&0.0303433&& -0.0236271&& 91, \\ 0.996634&& -0.0723931&& 0.0303433&& -0.0236271 &&91, \\ 0.808007&& -0.130079 &&-0.0379802&& 0.573378&& 55, \\ -0.198379&& -0.510865&& 0.784196&& -0.291032&& 106, \\ 0.747666&& -0.178766 &&-0.243948 &&-0.591209 &&126, \\ 0.786455&& 0.0119915 &&-0.00183512 &&-0.617528 &&128, \\ 0.998168&& 0.0519974 &&0.0282402&& -0.0126235&& 90, \\ 0.986411&& 0.0884718&& 0.106895 &&-0.0879797 &&95, \\ 0.992144&& -0.0262454 &&0.117952&& -0.0323751&& 91 \\ \end{align}

1

There are 1 best solutions below

2
On

Make a Q-Q plot, that is sort your 2 set by increasing order of magnitude, then make a scatter plot.

If the sample follow the same distribution, up to a scaling factor, the Q-Q plot will be a straight-line. Otherwise, you will learn in here how to interpret the difference, although Tukey's Exploratory Data Analysis (1977), is much clearer.