After reading through a few references, I have come to know that for machine learning in general, it is necessary to normalize features so that no features are arbitrarily large ($centering$) and all features are on the same scale ($scaling$).
However, I'm having a bit of difficulty in visualizing the impact of this for a decision tree. Does data normalization impact the decision tree structure? If yes, how?
Normalization should have no impact on the performance of a decision tree. It is generally useful, when you are solving a system of equations, least squares, etc, where you can have serious issues due to rounding errors. In decision tree, you are just comparing stuff and branching down the tree, so normalization would not help.