Why do use logarithms in the tf-idf formula?

Question

Why do use logarithms in the tf-idf formula?

1.3k Views Asked by Bumbble Comm At 29 Apr 2026 - 9:14

I was reading the definition of tf-idf in the corresponding Wikipedia article, but I don't fully understand the meaning of this formula and why it was constructed in this way.

If I understood correctly, idf should measure how frequently a term $S$ appears in all of the documents, decreasing in value as the term appears increasingly frequently. So, we calculate idf as follows

$$idf(S) = \frac{\# \text{ of documents}}{\# \text{ of documents containing S}}$$

Furthermore, term frequency, tf, can be defined as

$$ tf(S,D) = \frac{\# \ \text{of occurrences of S in document D}}{\# \ \text{maximum number of occurrences for any string Q in document D}} $$

So, tf-idf is then defined as

$$ \text{tf-idf} = idf(S) \times tf(S, D) $$

which is, in some way, proportional to how frequently a term $S$ appears in a given document $D$, and how unique that term $S$ is over the set of documents. But the formula given describes it as

$$ \log idf(S) \left( \frac{1}{2} + \log\left(\frac{1}{2} tf(S,D) \right) \right) $$

Why do use logarithms in the tf-idf formula above? What aspect do they emphasize?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

There's no one good answer or consensus, but some possible explanations:

Zipf's law is an empirical law that states that frequency distribution of data in social and physical sciences (e.g., some corpus of natural language utterances) often follows a Power-law distribution. More specifically, the frequency of a term is inversely proportional to some power of its rank. In practice, this means less frequent data are lumped together at one end of the frequency spectrum, whose frequencies aren't too distinguishable. Applying log to the frequencies "smooths" out the lumping (effectively linearizing the frequencies). This smoothing doesn't affect the relative ranking, but produces numbers that are easier to analyze. Note that the term frequency as defined in Zipf's law is a bit different than in the TF-IDF definitions.
It's often desirable to have scoring functions that are additive. Suppose you want to compute the IDF of a 2-tuple (two terms $A$ and $B$). If the distribution of $A$ and $B$ are roughly independent, you can express the joint probability of $A,B$ as simply the product of probability of A and probability of B. Hence, if you take the log, you get that the $IDF$ of $(A,B)$ is just $IDF(A) + IDF(B)$.

See http://nlp.cs.swarthmore.edu/~richardw/papers/robertson2004-understanding.pdf for possible alternative explanations (and their shortcomings). Another approach is to interpret the log terms as carrying the amount of "information" in the document (owing to similarities with Shannon's entropy).

The real answer for why they came up with the formula initially probably wasn't as complicated as some of the above explanations. But some combination of the above intuitions probably explains why it works relatively well in practice.

Why do use logarithms in the tf-idf formula?

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in LOGARITHMS

Related Questions in DEFINITION

Trending Questions

Popular # Hahtags

Popular Questions