Let's take the case of condition simple linear regression for example where we are assuming: $$Y|X=x = \beta_0 + \beta_1 x + \epsilon,$$ where $\epsilon$ represent the random noise.
In order to conduct inference my model assumptions are (1) the existence of a linear relationship between my predictor and my response, (2) that the random noise in my observations of my response has mean 0 and constant variance and that none of the noise is dependent, and (3) that the noise is Gaussian. Furthermore, we know that the residuals $e_i$ based on our model are not actually observations of the random noise.
My question is therefore, when checking for the third assumption, why do we only look at a Normal probability plot of the residuals? Why not instead use a Normal probability plot of the observed $Y$ values?
And as a followup question, suppose we assume the errors are distributed according to a non-Gaussian distribution. Are there any examples where if the random noise is non-Gaussian then the conditional response, $Y|X=x$, could follow a different distribution than the errors? (I.e. if $\epsilon \sim \cal{P}$ for some general probability distribution $\cal{P}$, are there any examples where the transformation $Y = const. + \epsilon$ is not also distributed as $\cal{P}$?
Usually you assume that the conditional distribution of $Y$ given $X$ is normal, hence checking the distribution of $\{y_i\}_{i=1}^n$ is checking the unconditional distribution of $Y$, and checking the distribution of $\{e_i\}_{i=1}^n$ is the same as checking the conditional distribution of $Y$.
The requirements for the optimality of the OLS estimators (Gauss-Markov theorem) are only finite and constant variance, zero mean noise and uncorrelated noise. No parametric distributional assumptions are needed. Moreover, normality is a very heavy assumption that has to be justified. You can see here https://stats.stackexchange.com/questions/84000/fitting-a-linear-model-with-non-gaussian-noise some examples of non normal (symmetric) noise in linear regression.