I can recite and use the definition of the m.l.e as well as derive it for common joint p.d.f's. But I have no idea what the purpose of it is. I tried searching it but even Wikipedia did not explain it well, and uses a lot of esoteric language. What is a real life application of finding the maximum likelihood estimator that is biased and unbiased? Why would a mathematician or statistician care about this?
I am studying from the 5th edition of "Introduction to Mathematical Statistics" by Robert Hogg
In short, MLE is one way of fitting a model to data. This is the real life application, and it is in fact used in a lot of real-life fitting procedures in practice. (For instance, it is one standard way to fit the ARIMA and GARCH time-series models that are popular in practice.)
We assume our data $(x_1,\ldots,x_n)$ comes from a parametric family of distributions $f_{\theta}(x).$ A very typical model is that the data are independent samples from a normal distribution $N(\mu,\sigma^2)$ of unknown mean $\mu$ and variance $\sigma^2.$
The purpose of estimation is to infer what the values of the parameters are from the data. One type of estimation is point estimation where you try and find a values for the parameters that fit the data well and are statistically likely to be correct in some sense. The other type is interval estimation where you find a range of values that the parameters are statistically likely to be in.
MLE is a type of point estimate. You say you understand the definition and how the estimator is typically calculated so I won't go too deep into that. You simply pick the parameters under which your data would be most probable. Hopefully it seems reasonable that this is a good guess for what the value of the parameter would be.
People like the MLE cause it has good properties: in other words the fact that it is a good guess can be mathematically substantiated. For instance, one typical metric a point estimator $\hat \theta(x_1,\ldots, x_n)$ is the mean squared error $$ E((\theta-\hat \theta(X_1,\ldots, X_n))^2),$$ which tells you, in a loose sense, how accurate we should expect our estimator to be. One nice property the MLE has is asymptotic efficiency. This roughly means that for very large samples, the MLE approaches the lowest mean squared error that is theoretically possible.