I want to understand the pairwise relationship between four types of regression: Bayesian Linear Regression, Gaussian Process Regression, Kernel Regression (Nadaraya-Watson), and Kernel Ridge Regression. I've encountered all of these reading the book Gaussian Processes for Machine Learning, and wanted a clear and concise way to tell them apart.
Here is my understanding so far: feel free to correct me. Bayesian Linear Regression treats the $\hat{\beta}$ regression coefficients (which is usually set to the OLS optimal estimator $\hat{\beta}= (X^TX)^{-1}X^T y$) as a random variable with a distribution that is a function of the prior (which we choose) and also the data (which encourages it to take the OLS form above, if that is our loss function). Gaussian Process Regression is just Bayesian Linear Regression where the prior over functions is a Gaussian process (which is a distribution over function space defined by the mean and covariance functions).
Then, my understanding of kernel regression is that a kernel is a ``similarity function" between data points, so an estimator of the form $\hat{y}(x) = \frac{\sum_i K(x, x_i)y_i}{\sum_i K(x, x_i)}$ being a weighted average seems intuitive to use. But then kernel ridge regression gives an estimator of the form $K_{x, X}(K_{X,X} + \lambda I)^{-1}y$. It does not seem this is the same as traditional kernel regression (of the N-W variety) with a ridge penalty. Furthermore, I see Gaussian Process Regression outputs this same estimator as its "posterior mean" where now $K$ is the covariance kernel, so somehow GPR (which is a special case of BLR?) is equivalent to KRR?
The relationship between these four simple methods is not clear to me. Some clarification is much appreciated, thank you. The more technical details the better. Asked on CrossValidated but to no avail, so cross-posting here.