Why is the value of information gain negative?

29 Views Asked by At

Here is a sample dataset. I have to calculate the Information gain in Variable. Am I missing anything as information gain should not be negative?

Variable Class
Val1 Positive
Val1 Positive
Val1 Positive
Val1 Positive
Val1 Positive
Val1 Positive
Val1 Positive
Val1 Positive
Val1 Positive
Val1 Positive
Val1 Negative
Val1 Negative
Val1 Negative
Val1 Negative
Val1 Negative
Val1 Negative
Val1 Negative
Val2 Positive
Val2 Positive
Val2 Positive
Val2 Negative
Val2 Negative
Val2 Negative
Val2 Negative

$$ \text{### Step 1: Calculate Entropy Before Split (Entropy}(S_{\text{before}})) \text{Entropy}(S_{\text{before}}) = -\frac{15}{24} \log_2\left(\frac{15}{24}\right) - \frac{9}{24} \log_2\left(\frac{9}{24}\right) \\ \text{Entropy}(S_{\text{before}}) \approx 0.954 $$

$$ \text{### Step 2: Calculate Entropy After Split by Gender (Entropy}(S_{\text{Male}}) \text{and Entropy}(S_{\text{Female}})) \text{Entropy}(S_{\text{Male}}) = -\frac{10}{17} \log_2\left(\frac{10}{17}\right) - \frac{7}{17} \log_2\left(\frac{7}{17}\right) \\ \text{Entropy}(S_{\text{Male}}) \approx 0.977 $$

$$ \text{Entropy}(S_{\text{Female}}) = -\frac{3}{7} \log_2\left(\frac{3}{7}\right) - \frac{4}{7} \log_2\left(\frac{4}{7}\right) \\ \text{Entropy}(S_{\text{Female}}) \approx 0.985 $$

$$ \text{### Step 3: Calculate Information Gain} \text{Information Gain} = \text{Entropy}(S_{\text{before}}) - \frac{17}{24} \text{Entropy}(S_{\text{Male}}) - \frac{7}{24} \text{Entropy}(S_{\text{Female}}) \\ \text{Information Gain} \approx 0.954 - 0.692 - 0.287 \\ \text{Information Gain} \approx -0.025 $$