I have a set of data, let's say "score vs loss ratio", the score is an arbitrary number I gave to a team by looking at their past history. I want to prove my hypothesis which is higher the score, high the loss ratio.
The variances among the loss values are very big, if I graphed scatter plot of "score vs loss ratio" , it looks like [1]https://i.stack.imgur.com/wfMLm.jpg. which has low pearson and spearman scores
If I take the percentiles of the score values, I think I can, because it's the independent variable, and then take the average of all the losses in that given percentile. The graph looks a lot more linear with higher Correlation coefficient. [2]http://imgur.com/avgzqQe That looks a lot better.
I can't find any proofs online to back up my work, can anyone tell me if there is something wrong with this approach, or this is okay.