InformationGain on Two Continuous classes instead of binary

58 Views Asked by At

I've a problem regarding an exercise with information gain. I can't seem to get the right answer, because the exersises differs from what we learned. Usually, a target class is a binary variable (skiing:yes or no). However in this example, its two classes with a number of instances.

So what I tried was the following: make the number of instances a binary class (Y>O).

To calculate Information gain I then

G(M,D) = H(M) - (6/12 * H(d1) -  6/12 * H(d2)
G(M,D) = H(7/12, 5/12) - 6/12 * H(5/6,2/6) - 6/12* H(3/6,4/6)
due to the Y:2, O:2 entry, it does not add up to 1

How should I approach this?

Dataset

1

There are 1 best solutions below

0
On BEST ANSWER

Information gain is the difference between original entropy in the output class and conditional entropy in the output class when you condition on the variable you are calculating the information gain for. Your conditioning variable $D$ has two values, $d_1$ and $d_2$. If $Y$ is the number of young and $N$ is the total number of data points, then the original entropy is $H(X) = p \log p + (1-p) \log (1-p)$ where $p = Y/N$. Similarly assume you have $Y_1$ young when $D = d_1$ and $N_1$ total when $D = d_1$, and similarly you have $Y_2$ and $N_2$ when $D = d_2$. Then the conditional entropy is $H(X|D) = q_1(p_1 \log p_1 + (1 - p_1) \log (1 - p_1)) + q_2(p_2 \log p_2 + (1 - p_2) \log (1 - p_2))$ where $q_1 = N_1 / N$, $q_2 = N_2 / N$, $p_1 = Y_1 / N_1$ and $p_2 = Y_2 / N_2$. And then $H(X) - H(X|D)$ is the information gain for variable $D$.