In a simple linear regression the predicted y values are also the “conditional means” at each x value. For each x value, there is a distribution of y values in the population. How exactly do we know each y value on the regression line is the mean of each conditional distribution for each y value?
I’m trying to think of this in the most simple way possible, with 10 x values and 10 y values. If y on the regression line is 5 when x is 1, then one would say “when x is 1, the mean value of y is 5.” How does the line tell us the “mean” of y when we only have one actual y value to work with?
Assume the random variable $Y$ can be modeled as $Y=\beta_0+\beta_1X_1+\dots+\beta_nX_n+\epsilon$ with $\epsilon\sim N(0,1)$ the random error term and $X_i$ being random variables.
After solving for the parameters $\beta_0,\beta_1,\dots,\beta_n$ via least squares, the random variable $Y$ is $\beta_0+\beta_1X_1+\dots+\beta_nX_n+\epsilon$ with $\beta_i$ filled in as actual numbers.
Then given $X_1=x_1,\dots,X_n=x_n$, $Y$ given $\textbf x$ is $\beta_0+\beta_1x_1+\dots+\beta_nx_n+\epsilon$. This is normal with mean $\beta_0+\beta_1x_1+\dots+ \beta_nx_n$ and variance 1. That is, $E(Y|\textbf x)=\beta_0+\beta_1x_1+\dots+ \beta_nx_n$.
But this is exactly the value of the least squares regression line evaluated at $\textbf x$.