How to fit a function that depends on several nominal and one real variable?

262 Views Asked by At

I have data that map several nominal variables and one real parameter into a real value. For example:

('A', 'left', 'male', 'dog', 1.3459) -> 3.453
('A', 'top', 'male', 'dog', 6.3459) -> 6.137
...
('C', 'right', 'female', 'cat', 4.726) -> 1.456

I need to use these data to fit a function so that I can predict values for new input. For example:

('C', 'top', 'female', 'cat', 0.3459) -> ?

Fortunately, within a good approximation the output is a linear function of the real argument: y = c + k*x. However, out hypothesis is that we can improve the quality of the fit if we assume that the parameters of the linear fit ("c" and "k") depend on the nominal variables.

My questions is: How to find dependency of the fit parameters on the nominal variables?

The first idea is to consider all possible combinations of the value of the nominal variables and for each combination perform a fit. Let us say, we take all the data for ('A', 'left', 'male', 'dog') and perform the linear fit. Then we do the same for ('A', 'left', 'male', 'cat'). However, this approach will not work since I have a lot of combinations that have no data points or have a small number of points.

Alternatively, I can make independent fits for different values of a fixed nominal variable and ignore the other variables. Then I can do the same for the second nominal variable and so on. But then the question is how to combine these independent fits.

So, what would be you approach?

2

There are 2 best solutions below

0
On BEST ANSWER

Here is one approach. First, for each training example calculate the expected output using just the real value and the learnt linear relation. Next subtract the real output from this estimate to give you your prediction error. You can now re-write your training set as:

('A', 'left', 'male', 'dog') -> -0.2
('A', 'top', 'male', 'dog') -> 0.41
...
('C', 'right', 'female', 'cat') -> 0.251

ie, with just the nominal variables as inputs and the error resulting from the learnt linear approximation as outputs.

You now want to learn the mapping between nominal variables an error of linear prediction.

My initial suggestion would be to learn a decision tree [1] to estimate just c (using your original k). Then using these modified c values learn a new decision tree to estimate the gradient k. You might want to repeat these two steps a few time ie, now re-estimate the c values using our latest k values, and so on. I suspect (/hope!) this process would converge, and the final two decision trees (one for c and one for k) can be used to predict c and k for a give input set of nominal variables.

I hope that helps - or at least gives you an approach to consider.

0
On

I would try some locality-sensitive hashing function to map the nominal parameters (or entire tuples) into numbers. The general idea behind this solution is dimensionality reduction, which is about reducing over-complex data to smaller domains.

Arbitrary hash function also would map nominal parameters (or tuples) to numbers, but probably the interpolation wouldn't work, because most hashing functions are designed to avoid collisions of similar objects, as they are used for distinguishing between objects, in contrast to LSH functions.

Another solution is, having similarity function (which returns similarity of two tuples), to compare the new tuple with each one of the learning set, and compute the value by weighing their respective values with similarities. The Jaccard index should work well as a similarity function, but you may design something more sophisticated.