when we have both categorical and numerical attributes in our data, it is said we can convert our categorical attributes to numerical by using some methods like binary variables. my question is should we convert ALL attributes to binary? for example we have 2 categorical and 3 numerical. should I convert all 5 attributes to binary?
Update
I need to compute a distance between instances, (like euclidean distance) but I think comparing the distance between a binary variable and an attribute with a big number like salary = 20000 is meaningless. so for this example should I convert the salary to binary variables?
It is called one-hot encoding. Its only needs to be performed with the categorical variable. For example:
Salary | Department | Gender
2000 | HR | Male
4000 | Tech | Female
500 | HR | Female
900 | Admin | Male
Will be transformed to:
Salary | Department_HR | Department_Tech | Department_Admin | Gender_Male | Gender_Female
2000 | 1 | 0 | 0 | 1 | 0
4000 | 0 | 1 | 0 | 0 | 1
500 | 1 | 0 | 0 | 0 | 1
900 | 0 | 0 | 1 | 1 |0