Let $Y,X_1,X_2$ be binary random variables.
I'm training a naive binary tree classifier and I need to find the best binary projection $f: \{0,1\}^2 \to \{0,1\}$ such that $H(Y|f(X_1,X_2))$ is minimal.
The greedy choice of $f$ is to assign it the more likely value of $Y$ on every $(x_1,x_2) \in \Omega$, i.e.
$$ f(x_1,x_2) = \begin{cases} 1 & \text{ if } \mathbb{P}(Y=1|X_1=x_1,X_2=x_2) > \frac{1}{2} \newline 0 & \text{ otherwise.} \end{cases} $$
Question. Is the above the optimal choice of $f$ w.r.t $H(Y|f(X_1,X_2))$, or could there be a different choice of $f$ that yields smaller $H(Y|f(X_1,X_2))$?
If yes, my algorithm will immediately become 16x faster (since there are 16 possible binary operators on $\mathbb{F}_2$).