I want to know if my calculation is wrong or correct, because i got a different result when i use an online calculator.
Here is the dataset:
A;B;C;Class
y;y;y;group1
n;y;y;group2
y;n;n;group3
y;y;n;group3
n;y;n;group2
y;n;y;group1
n;n;n;group1
I am using ID3 to pick a root node. The difference appear on when i calculate the information gain when using attribute C to be the root node. my calculation on attribute A and B are aligned with the online calculator.
E(S) = -3/7 log2 3/7 - 2/7 log2 2/7 - 2/7 log2 2/7 = 1.557 (round to 3 decimal spaces)
when using attribute C as root
P(Y) = 4/7
P(N) = 3/7
P(Group1|Y) = 3/4
P(Group2|Y) = 1/4
P(Group3|Y) = 0/4
Entropy(Y) = -3/4 log2 3/4 - 1/4 log2 1/4 = 0.811
P(Group1|N) = 0/3
P(Group2|N) = 1/3
P(Group3|N) = 2/3
Entropy(N) = -1/3 log2 1/3 - 2/3 log2 2/3 = 0.918
G(C) = 1.557 - 4/7 x 0.811 - 3/7 x 0.918 = 0.700
However, when i use online calculator to check my calculation, the gain of G(C) is 0.306. I have double check my calculation, am not sure what's wrong. Could someone help to point out?
Thanks