How to use absolute discounting when the denominator and the numerator are both zero

156 Views Asked by At

Absolute Discounting Method for Smoothing

Sometimes when we are using Bayes Classifier for classifying text (e.g. a sequence of words), we face a probability which is equal to zero and this makes the probability of the whole sequence zero. In order to solve this problem, many smoothing methods have been proposed and one of them is called "Absolute Discounting".

The idea of the absolute discounting method for smoothing comes from the tax system. It says that we should take some money from the ones who are rich and somehow distribute that money between the poor people. In the case of probabilities, the rich people are the probabilities which are not equal to zero, and the other probabilities are like poor individuals.


Classifying text data using Bayes Classifier

This is the original formula for the bigram model when a sequence like $w_1,w_2,\dots,w_n$ is given and we want to find to the probability that this headline belongs to category $c$:

$P(w_1,w_2,\dots,w_n|c)=\Pi_{i=1}^{n}\frac{P(w_{i-1},w_i|c)}{P(w_{i-1}|c)}$

Now, according to the absolute discounting method, if we have $B$ rich probabilities, then we use the below formula instead of the previous one:

$P(w_1,w_2,\dots,w_n|c)=\Pi_{i=1}^{n}\frac{max(P(w_{i-1},w_i|c)-\delta,0)}{P(w_{i-1}|c)}+\frac{\delta*B}{P(w_{i-1}|c)}*P_{BG}$

Note 1: In the second formula, $P_{BG}$ is a background probability that we can define. For example, one possible $P_{BG}$ is defined as $P(w_i|c)$.

Note 2: For more information about this method, visit this page.


The problem:

As you know, $P(w_{i-1}|c)$ is equal to "the number of documents which belong to category $c$" divided by "the number of times that $w_{i-1}$ has appeared in the documents related to category $c$". This probability can be zero. In this case, $P(w_{i-1},w_i|c)$ becomes zero as well. So, we are not just facing a zero probability. Instead, we are facing a probability which is terribly worse. $\frac{0}{0}$. What should we do in this situation? Should we use another background probability? Should we just consider the $\frac{0}{0}$ as zero and announce that the probability of the whole sequence is zero as well?