I have a set of data, let's say describing a group of people. Let's say we know their income and color of hair:
N | hair | income
---|--------|------
1 | brown | £2000
2 | black | £1400
3 | brown | £1800
4 | red | £1600
5 | brown | £2500
6 | black | £2800
7 | white | £3000
8 | white | £1800
9 | red | £1600
Is it possible to find out whether the independent variable, hair color, has an impact to the dependent variable, salary? The problem I see is that we cannot "sort" the hair colors. However, I would like to know a result similar to:
Red color -> highest salary Brown and black -> middle, not significant difference White -> lowest salary
What's the best method to get such results? Is it safe to number the hair colors, or do we need to create a dummy variable for each color extra?
As the hair colors do not fall naturally into an ordered categorical type, for e.g small, medium, large; I would suggest that binary dummy variables are the best approach here. They also have the advantage that they are symmetric across the levels of the class, that is to say that we value each state equally for the analysis.