For example, I am given a line fitting problem
Find pair $(a,b)$ that best fits $y_i = ax_i + b + z_i$, where $z_i$ is iid gaussian noise
Why would we try to find:
$(a^*, b^*)$ = argmax $\ p_{x,y}(x,y)$ = argmax $p_z(y_i - (ax_i + b))$?
Since the probability density function does not give us the probability of finding $x,y$, shouldn't we attempt to argmax the integral of the probability density function?
The integral of a Probability Density Function is the Cumulative Density Function. The maxiumum of any CDF is $1$, and the argument of this for a Gaussian distribution is $+\infty$. So this is clearly not what is required.
What you are looking for is the point around which the noise is most likely to occur. This is the mode of the distribution; and that is the argmax of the probability density function.