Approximation of binary entropy function where weighted sum has minimum at same input

249 Views Asked by At

I'm trying to find ways to find an approximation of a function, because said function is "too much of a hassle" to calculate manually (e.g. with a simple calculator). To be specific, this is the function that's "too much of a hassle" to calculate manually:

$B(q) = - (q \log q + (1 - q) \log (1 - q))$

This function is used such: Given a set $A$ with elements of the form $(p,n)$, where $p,n \in \mathbb{N}$, define

$R(A) = \sum_{(p,n) \in A} (p + n) B(\frac{p}{p + n})$

Given multiple sets $A_1$, $A_2$, ... $A_n$ I need to find the $A_i$ which gives the minimum value for $R(A_i)$. I don't care for the actual value $R(A_i)$, I just need to identify $A_i$.

Is there a way to "simplify" $B(q)$ such that I can calculate it with less operations but still correctly identify which $A_i$ (from some given set of such sets) has the lowest value?


Background: I need to train/build decision trees, manually (without access to a computer) and under tight time constraints. $B(q)$ is the binary entropy function. A set $A$ stands for an attribute (of the training set), each pair $(p,n)$ of it defines the number of positive ($p$) and negative ($n$) training samples per attribute value (the set contains one pair for each attribute value). $R(A)$ is the "remaining entropy" after using the attribute for the next decision. The less entropy remains, the higher the information gain. Thus the attribute with the minimal $R(A)$ is the best choice for the next decision.