In Elements of Information Theory by Thomas and Cover, the entropy of a random variable is defined by $$H(X) = \sum_x p(x) \log p(x)$$ where the units are bits if the log base is 2 and nats if the log base is $e$.
Why do we need units at all here? Especially since $p(x)$ has no units.
If there were a single way to define entropy (say, $\ln p$), then the units would not be needed. However, since logarithms need to be specified by their bases, we need a way to call out what base is used. This creates the need to give a unit. Probability, by being restricted to $[0,1]$ has no such problem.