Why do we want a decision tree to be shallow? Why do we split to maximize information gain?

697 Views Asked by Bumbble Comm At 10 May 2026 - 5:13

When constructing a decision tree, we use a measure such as the Gini Impurity or Information Gain to decide which split is best. IIUC, we want to have as shallow as tree as possible.

But why do we care? What difference would it make if we would construct a larger decision tree splitting on some non essential parameter first? Is it because we ideally don't want to rely on those features that do not seem to carry much information as there is a higher chance we would be fitting the noise and thus limiting our ability to generalize?

Am I on the right track here and has the relationship of relative signal strength vs likelihood of it being noise been formalized?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 25 Sep 2017 - 9:15

You want a decision tree that is the "simplest" (so as to avoid overfitting the data) and that means the tree with the fewest nodes. A binary splitting rule that leads to half of the unassigned patterns to one branch and half to the other is of course the maximum-information decision, and thus is optimal at that level of the tree.

Why do we want a decision tree to be shallow? Why do we split to maximize information gain?

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in INFORMATION-THEORY

Related Questions in DECISION-TREES

Trending Questions

Popular # Hahtags

Popular Questions