I have three questions: What is the interpretation of the outcome of the Probability Density Function (PDF) at a particular point? How is this result related to probability? What we exactly do when we maximize the likelihood?
To better explain my questions:
(i) Consider a continuous random variable $X$ with a normal distribution such that $\mu=1.5$ and $\sigma^2=2$. If we evaluate the PDF at a particular point, say $3.4$, using the formula:
$$f(X)=\frac{1}{{ \sqrt {2\pi \sigma^2 } }} e^{\frac{- \left( {X - \mu } \right)^2}{{ 2\sigma^2 }}} ~,$$
we get $f(3.4)=0.1144$. How we interpret this value?
(ii) I previously read that the result of 0.1144 is not necessarily the probability that $X$ takes the value of $3.4$. But how the result is related to probability concept?
(iii) Consider a sample of the continuous random variable $X$ of size $N=2.5$, such that $X_{1}=2$ and $X_{2}=3.5$. We can use this sample to maximize the log-likelihood:
$$\max \ln L(\mu,\sigma|X_{1},X_{2}) = \ln f(X_1) + \ln f(X_2) $$
If $f(X)$ is not exactly a probability, what are we maximizing? Some texts detail that "we are maximizing the probability that a model (set of parameters) reproduces the original data". Is this phrase incorrect?
The probability density function is the unsigned derivative of the cumulative probability function.
$$f_{\small X}(x)=\begin{vmatrix}\dfrac{\mathrm d ~~}{\mathrm d x}\mathsf P(X\leq x)\end{vmatrix}$$
It may be considered the "gradient of the tangent" of the curve; that is, the "rate of change" of accumulation of probability, as value for the continuous random variable increases.
So you are maximising the amount the parameters contribute to immediate accumulation of probability around the data points.