Please help me understand a step to calculate the mean $\mu$ of this distribution.
Setting:
- $Y_{1}, \ldots, Y_{n}$ are independent given the pair $\left(\beta_{0}, \beta_{1}\right)$
- $Y_{i}=\beta_{0}+\beta_{1} X_{i}+\epsilon_{i}$, where each $\epsilon_{i}$ are i.i.d. $\mathcal{N}(0,1 / \tau)$ (which has variance $\left.1 / \tau\right)$
- the $X_{i} \in \mathbb{R}$ are deterministic. We will think of $\beta_{0}, \beta_{1}$ and $\tau$ as being random variables. Suppose that we place an improper prior on $\left(\beta_{0}, \beta_{1}, \tau\right)$ : $$ \pi\left(\beta_{0}, \beta_{1}, \tau\right)=1 / \tau $$ Since this expression on the right hand side does not depend on the $\beta$ 's, we may take the conditional distribution $\pi\left(\beta_{0}, \beta_{1} \mid \tau\right)$ to be the "uniform" improper prior: $\pi\left(\beta_{0}, \beta_{1} \mid \tau\right)=1$.
Answer the following problems given these assumptions. As a reminder, we let $\mathbb{X}$ be the design matrix, where the $i$ th row is the row vector $\left(1, X_{i}\right)$, and let $\mathbf{Y}$ be the column vector $\left(\begin{array}{c}Y_{1} \\ \vdots \\ Y_{n}\end{array}\right)$. Observe that if $\beta_{0}, \beta_{1}$ and $\tau$ are given, then each $Y_{i}$ is a gaussian: $Y_{i} \mid\left(\beta_{0}, \beta_{1}, \tau\right) \sim \mathcal{N}\left(\beta_{0}+\beta_{1} X_{i}, 1 / \tau\right)$ Therefore, the likelihood function of the vector $\left(Y_{1}, \ldots, Y_{n}\right)$ given $\left(\beta_{0}, \beta_{1}, \tau\right)$ is of the form $$ \left(\frac{1}{\sqrt{2 \pi / \tau}}\right)^{n} \exp \left(-\frac{\tau}{2} \sum_{i=1}^{n}\left(y_{i}-\beta_{0}-\beta_{1} X_{i}\right)^{2}\right) $$ It turns out that the distribution of $\left(\beta_{0}, \beta_{1}\right)$ given $\tau$ and $Y_{1}, \ldots, Y_{n}$ is a 2-dimensional Gaussian. In terms of $\mathbb{X}, \mathbf{Y}$ and $\tau$, what is its mean and covariance matrix?
Kindly help me figuring out $\mu$! I am having a lot of problems here. Please be as details as you would to a small child, or a golden retriever!

You are almost there.
$(X-\mu)^T \Sigma^{-1} (X-\mu)= X^T \Sigma^{-1} X - 2\mu^T\Sigma^{-1}X + constant$
Comparing the equations, we will have
$\tau Y^TX = \mu^T\Sigma^{-1} \implies \mu^T = Y^TX(X^TX)^{-1} \implies \mu = (X^TX)^{-1}X^TY$
which is the usualy OLS result.