I'm trying to understand Maximum Likelihood estimators in the context of general estimation theory. I know Bayesian estimator minimizes mean squared loss, MAP estimator minimizes all-or-nothing loss (loss is zero if the estimator estimates the correct parameter and 1 otherwise). Which loss function does the maximum likelihood function minimize?
My thought was that it is negative of the log-liklihood function but the definition of the loss function includes an estimator $T(X)$ and parameter $s$. As I see it, the negative of the log-likelihood function does not have any estimator in it.
Kullback-Leibler divergence (between the empirical and theoretical probability distributions) is the loss function minimized by the MLE, at least according to this derivation, which looks legitimate on first glance.