Let's say I have some data where humans classified on how good/bad it is, based on their judgement. They use a Scala (1-7) to categorize how good/bad a data entry is.
Now, I have another automatic evaluation, which computes a score for me how good/bad a data entry is. The metric computes a score for each data entry between 0.0 - 1.0
I want to find out how strong the correlation between the human evaluation and automatic evaluation is.
Is it feasible to use Pearson/Spearman/Kendall here, although my human rating (1-7) is categorical and my automatic evaluation is metric (0.0-1.0)? Do I have to rescale the human ratings between 0.0-1.0 as well?
Many thanks in advance!