I am reading Non-negative Matrix algorithm using KL-divergence as metric.
The KL-divergence is known as $D(P,Q)=\sum_i P(i)log\frac{P(i)}{Q(i)}$ for discrete distribution.
However, the KL-divergence between two non-negative matrix shows as $D(A,B)=\sum_{i,j} A_{i,j}\log\frac{A_{i,j}}{B_{i,j}}+\sum_{i,j} B_{i,j}-\sum_{i,j}A_{i,j}$
I don't understand where is the function is derived from?