I'm implementing some stuff for machine learning and I ran across this post detailing some information on Bayes Theorem I was looking for:
However, the reply from alto uses arg max. I see this alot, especially when dealing with Naive Bayes classification. Can anyone explain to me how it works? The wikipedia makes sense, except in this context I dont get why you would use this.
Thanks!
In Naive Bayes classification, argmax is useful because you want to find the class $C_i$ that an object $D$ is most likely associated with. To do this, you compute a probability for $D$ belonging to each $C_i$, and you choose the $C_i$ that maximizes this probability. Choosing such a $C_i$ is exactly what argmax does for you.
I didn't mean to make computing argmax sound more complex than it actually is, but I'll talk about it since you asked. In Naive Bayes, there are usually few enough classes to compute argmax using brute force: simply compute the probability for all $C_i$ and return the $C_i$ that results in largest one. In general (that is, outside of Naive Bayes), computing the argmax of a function is as hard as maximizing it, so for some functions, using a brute force approach to compute their argmax would take an eternity.