Here is a sample dataset. I have to calculate the Information gain in Variable. Am I missing anything as information gain should not be negative?
| Variable | Class |
|---|---|
| Val1 | Positive |
| Val1 | Positive |
| Val1 | Positive |
| Val1 | Positive |
| Val1 | Positive |
| Val1 | Positive |
| Val1 | Positive |
| Val1 | Positive |
| Val1 | Positive |
| Val1 | Positive |
| Val1 | Negative |
| Val1 | Negative |
| Val1 | Negative |
| Val1 | Negative |
| Val1 | Negative |
| Val1 | Negative |
| Val1 | Negative |
| Val2 | Positive |
| Val2 | Positive |
| Val2 | Positive |
| Val2 | Negative |
| Val2 | Negative |
| Val2 | Negative |
| Val2 | Negative |
$$ \text{### Step 1: Calculate Entropy Before Split (Entropy}(S_{\text{before}})) \text{Entropy}(S_{\text{before}}) = -\frac{15}{24} \log_2\left(\frac{15}{24}\right) - \frac{9}{24} \log_2\left(\frac{9}{24}\right) \\ \text{Entropy}(S_{\text{before}}) \approx 0.954 $$
$$ \text{### Step 2: Calculate Entropy After Split by Gender (Entropy}(S_{\text{Male}}) \text{and Entropy}(S_{\text{Female}})) \text{Entropy}(S_{\text{Male}}) = -\frac{10}{17} \log_2\left(\frac{10}{17}\right) - \frac{7}{17} \log_2\left(\frac{7}{17}\right) \\ \text{Entropy}(S_{\text{Male}}) \approx 0.977 $$
$$ \text{Entropy}(S_{\text{Female}}) = -\frac{3}{7} \log_2\left(\frac{3}{7}\right) - \frac{4}{7} \log_2\left(\frac{4}{7}\right) \\ \text{Entropy}(S_{\text{Female}}) \approx 0.985 $$
$$ \text{### Step 3: Calculate Information Gain} \text{Information Gain} = \text{Entropy}(S_{\text{before}}) - \frac{17}{24} \text{Entropy}(S_{\text{Male}}) - \frac{7}{24} \text{Entropy}(S_{\text{Female}}) \\ \text{Information Gain} \approx 0.954 - 0.692 - 0.287 \\ \text{Information Gain} \approx -0.025 $$