Correlation in scatter plot

586 Views Asked by At

This may be a very basic matter for statisticians, but I still have no intuition for this sort of thing. Here goes:

I have two quantities (the nature of which is irrelevant) which I suspect to be correlated. The intrinsic scatter is so large, however, that this is not self-obvious from the plot (see attached plot). The error in the quantity on the x-axis is 0.3, the error in the quantity on the y-axis is 0.2. Now, it was suggested to test for a correlation by fitting an ellipse, which should have a tilted orientation if there is indeed a correlation.

I did not comprehend this fully, as it turns out, however. Should I leave the semi major - and minor axes of this ellipe as free parameters, or set them equal to the errors? How do I know the result is not purely coincedental - what is the convidence on this method?

Is there perhaps something else entirely you'd try to this end?

I hope my question is clear - if not, please ask me.enter image description here

2

There are 2 best solutions below

0
On BEST ANSWER

It is difficult to gauge the confidence level of fitting an ellipse corresponding to error as what you will conclude will be if the shape and orientation of the ellipse shows a certain correlation then the data may have that correlation? a bit vague right! The errors will be the parameters of the ellipse equation, so the axes of the ellipse should not change, just the orientation of the ellipse itself.

As for other methods, standard regression is an option. Depending on what kind of data you have, if it is time series or observational etc.. you could use other methods such as cross-sectional regression or pairwise t-tests.

If maybe you are looking to ascertain the existence of a difference between two variables, you could look at methods such as Tukey's difference analysis combined with an appropriate form of t-test and set your own confidence levels in the tests. Hope this helps in some way, best of luck with your work!

0
On

"Fitting an ellipse" is not the way to detect correlation. First, find the (coefficient of) correlation $r.$ As noted in the Comment, you will find that it is very nearly $0$ for the data in your plot. A brief and elementary discussion of $r$ follows.

For normal data, there is a statistical test using $r$ to see whether the underlying bivariate population from which the data arise is different from $\rho = 0.$

Let's call the two variables $X$ and $Y.$ An intuitive way to judge correlation is to draw a horizontal line at $\bar Y$ and a vertical line at $\bar X.$ They cross at the 'center of gravity' of the data cloud. If most of the observations (dots) are above-right and below-left of center then you likely have a positive association and $0 < r \leq 1.$ By contrast, if most of the observations are above-left and below-right of enter then it is likely that $-1 \leq r < 0.$ In your plot the points are about equally apportioned among the four 'quadrants' so $r \approx 0.$

The sample correlation $r$ measures linear association. If all points of the scatterplot lie precisely on a line of positive slope, then $r = 1.$ If all lie on a line of negative slope, then $r = -1.$ Always, $-1 \leq r \leq 1.$

Nonlinear association is possible. For example, points might lie precisely along the parabola $y = x^2$ spread evenly between, say $\pm 3,$ in such a way that $r = 0.$ Yet x's can be used to predict y's exactly. There is perfect correlation, but it has no linear component.