Relationship between Probability and Probability Density

Question

Relationship between Probability and Probability Density

43 Views Asked by Bumbble Comm At 01 Apr 2026 - 2:06

I am attempting to understand method of least square for regression. So, likelihood of parameter is defined as $$\mathcal L(\vec\theta)\stackrel{\text{def}}=\prod_{i=1}^mp_Y(\vec y^{(i)}|\vec x^{(i)};\vec\theta)$$

It seems like the squared cost function $\mathcal J_\text{sq}(\vec\theta)$ is derived from $\mathcal L(\vec\theta)$ in the following steps: $$\begin{align}\ln\mathcal L(\vec\theta)&=\ln\prod_{i=1}^mp_Y(\vec y^{(i)}|\vec x^{(i)};\vec\theta)\\&=\ln\prod_{i=1}^m\frac1{\sigma\sqrt{2\pi}}\exp(-\frac12(\frac{\vec y^{(i)}-\hat y_\theta^{(i)}}{\sigma})^2)\\&=\ln\frac1{\sigma\sqrt{2\pi}}-\frac1{2\sigma^2}\sum_{i=1}^m(\vec y^{(i)}-\hat y_\theta^{(i)})^2\end{align}$$$$\begin{align}&\because f(\cdot)=-\frac{\sigma^2}m((\cdot)-\ln\frac{1}{\sigma\sqrt{2\pi}})\text{ is decreasing. }\\&\therefore\text{As }\mathcal J_\text{sq}(\vec\theta)=\frac1{2m}\sum_{i=1}^m(\vec y^{(i)}-\hat y_\theta^{(i)})^2\text{ decreases, }\mathcal L(\vec\theta)\text{ increases. }\end{align}$$ I understand every line in the derivation except the second line. Line 2 implies that $$p_Y(\vec y^{(i)}|\vec x^{(i)};\vec\theta)=\frac1{\sigma\sqrt{2\pi}}\exp(-\frac12(\frac{\vec y^{(i)}-\hat y_\theta^{(i)}}{\sigma})^2)\tag{*}$$ because of normality assumption: $$\vec y^{(i)}|\vec x^{(i)};\vec\theta\sim\mathcal N(\hat y_\theta^{(i)},\sigma^2)$$ In (*), LHS is probability of event "$Y=\vec y^{(i)}$ given $X=\vec x^{(i)}$" parametrized by $\vec\theta$, while RHS is the probability density function (PDF) of $\vec y^{(i)}|\vec x^{(i)};\vec\theta$. LHS is a probability of continuous random variable $Y$ conditioned on $X$ parametrized by $\vec\theta$.

My confusion is: Generally, given continuous random variable Y, $$p[Y≤y]=\int_0^yf_Y(y)dy$$

Isn't $Y$ continuous? How do we even find the probability of event "$Y=\vec y^{(i)}$ given $X=\vec x^{(i)}$" parametrized by $\vec\theta$? Why isn't RHS of (*) an integral?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

UPDATE

You are using the incorrect analogy. What you are really asking on the LHS is what is the probability the sample taken takes specifically the given values. In other words, assuming $X_1, \ldots, X_n$ are discrete you could write one term as $$ \mathbb{P}[X_k=x_k] = f(x_k), $$ and even in the case where $\{X_k\}_{k=1}^n$ are continuous, you are looking for $f(x_k)$, not $F(x_k)$ as you are suggesting.

OLD ANSWER

LHS is not really a PMF, it is not really discrete. It is a product of finite number of terms, since your sample is a finite set, and each element of the sample contributes one term to the LHS.

However, each term is really a continuous probability since you are assuming every variable involved has normal distribution. As a result, you end up with a finite product of continuous terms, exactly as your RHS indicates.

Relationship between Probability and Probability Density

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in LEAST-SQUARES

Related Questions in MAXIMUM-LIKELIHOOD

Trending Questions

Popular # Hahtags

Popular Questions