I'm currently reading the paper, "LINE: Large-scale Information Network Embedding" and trying to work through all the equations in the paper. In the paper, they define two probability distributions and create an objective function so that the distance between the two distributions is minimized. After plugging both distributions into the KL-Divergence formula the authors state that they omit the constants (constants defined in the probability distributions and objective function.
My question is: Why do the author's choose to omit the constants? Does this make the computation less accurate? Does it have an impact on computation performance?