I am attempting to understand method of least square for regression. So, likelihood of parameter is defined as $$\mathcal L(\vec\theta)\stackrel{\text{def}}=\prod_{i=1}^mp_Y(\vec y^{(i)}|\vec x^{(i)};\vec\theta)$$
It seems like the squared cost function $\mathcal J_\text{sq}(\vec\theta)$ is derived from $\mathcal L(\vec\theta)$ in the following steps: $$\begin{align}\ln\mathcal L(\vec\theta)&=\ln\prod_{i=1}^mp_Y(\vec y^{(i)}|\vec x^{(i)};\vec\theta)\\&=\ln\prod_{i=1}^m\frac1{\sigma\sqrt{2\pi}}\exp(-\frac12(\frac{\vec y^{(i)}-\hat y_\theta^{(i)}}{\sigma})^2)\\&=\ln\frac1{\sigma\sqrt{2\pi}}-\frac1{2\sigma^2}\sum_{i=1}^m(\vec y^{(i)}-\hat y_\theta^{(i)})^2\end{align}$$$$\begin{align}&\because f(\cdot)=-\frac{\sigma^2}m((\cdot)-\ln\frac{1}{\sigma\sqrt{2\pi}})\text{ is decreasing. }\\&\therefore\text{As }\mathcal J_\text{sq}(\vec\theta)=\frac1{2m}\sum_{i=1}^m(\vec y^{(i)}-\hat y_\theta^{(i)})^2\text{ decreases, }\mathcal L(\vec\theta)\text{ increases. }\end{align}$$ I understand every line in the derivation except the second line. Line 2 implies that $$p_Y(\vec y^{(i)}|\vec x^{(i)};\vec\theta)=\frac1{\sigma\sqrt{2\pi}}\exp(-\frac12(\frac{\vec y^{(i)}-\hat y_\theta^{(i)}}{\sigma})^2)\tag{*}$$ because of normality assumption: $$\vec y^{(i)}|\vec x^{(i)};\vec\theta\sim\mathcal N(\hat y_\theta^{(i)},\sigma^2)$$ In (*), LHS is probability of event "$Y=\vec y^{(i)}$ given $X=\vec x^{(i)}$" parametrized by $\vec\theta$, while RHS is the probability density function (PDF) of $\vec y^{(i)}|\vec x^{(i)};\vec\theta$. LHS is a probability of continuous random variable $Y$ conditioned on $X$ parametrized by $\vec\theta$.
My confusion is: Generally, given continuous random variable Y, $$p[Y≤y]=\int_0^yf_Y(y)dy$$
Isn't $Y$ continuous? How do we even find the probability of event "$Y=\vec y^{(i)}$ given $X=\vec x^{(i)}$" parametrized by $\vec\theta$? Why isn't RHS of (*) an integral?
UPDATE
You are using the incorrect analogy. What you are really asking on the LHS is what is the probability the sample taken takes specifically the given values. In other words, assuming $X_1, \ldots, X_n$ are discrete you could write one term as $$ \mathbb{P}[X_k=x_k] = f(x_k), $$ and even in the case where $\{X_k\}_{k=1}^n$ are continuous, you are looking for $f(x_k)$, not $F(x_k)$ as you are suggesting.
OLD ANSWER
LHS is not really a PMF, it is not really discrete. It is a product of finite number of terms, since your sample is a finite set, and each element of the sample contributes one term to the LHS.
However, each term is really a continuous probability since you are assuming every variable involved has normal distribution. As a result, you end up with a finite product of continuous terms, exactly as your RHS indicates.