The model is linear regression, that is $$Y=X\beta+\epsilon $$ where $Y$ is the vector of response variable, $X$ is the design matrix, $\beta$ is the vector of regression coefficients and $\epsilon$ is the noise vector. So I want to compute the posterior distribution of $\beta$ and $\sigma^2$ given data: $$p(\beta,\sigma^2|Y)\propto p(Y|\beta,\sigma^2)p(\beta,\sigma^2), $$ where $$p(Y|\beta,\sigma^2)=(2\pi\sigma^2)^{-n/2}\exp\left\{-\frac{1}{2\sigma^2}(Y-X\beta)^T(Y-X\beta)\right\}\sim N(X\beta,\sigma^2I)$$ and $$ p(\beta,\sigma^2)\propto p(\beta|\sigma^2)p(\sigma^2).$$ $p(\beta,\sigma^2)$ is the prior distribution of $\beta$ and $\sigma^2$, and $p(\beta|\sigma^2)$ is a normal prior distribution: $$p(\beta|\sigma^2)\propto (\sigma^2)^{-k/2}\exp\left\{-\frac{1}{2\sigma^2}(\beta-\mu_0)^T A_0(\beta-\mu_0)\right\}\sim N(\mu_0,\sigma^2A_0^{-1})$$ and $$p(\sigma^2)\propto (\sigma^2)^{-(a+1)}\exp\left\{-\frac{b}{\sigma^2}\right\}\sim Inv.Gamma(a,b). $$
So the posterior distribution is now as follows: $$p(\beta,\sigma^2|Y)\propto p(Y|\beta,\sigma^2)p(\beta|\sigma^2)p(\sigma^2) \\\propto (2\pi\sigma^2)^{-n/2}\exp\left\{-\frac{1}{2\sigma^2}(Y-X\beta)^T(Y-X\beta)\right\} \times\\ (\sigma^2)^{-k/2}\exp\left\{-\frac{1}{2\sigma^2}(\beta-\mu_0)^T A_0(\beta-\mu_0)\right\}\times \\ (\sigma^2)^{-(a+1)}\exp\left\{-\frac{b}{\sigma^2}\right\}\\ = (\sigma^2)^{-\frac{1}{2}(n+k)-(a+1)}\exp\left\{-\frac{1}{2\sigma^2}(Y^TY-Y^TX\beta-\beta^TX^TY+\beta^TX^TX\beta \\+\beta^TA_0\beta-\mu_0^TA_0\beta-\beta^TA_0\mu_0 +\mu_0^TA_0\mu_0+2b)\right\}.$$
My main question is how does one prove that if the prior distribution follows a normal distribution then the posterior distribution also follows a normal distribution, i.e. conjugate prior. For the past few days, I have tried to compute in all possible ways and now, I get stuck to this point. I have also looked at Bayesian Regression Simplifying the posterior, which is similar to this question but the only difference is the prior distribution being non-standard normal.
The above procedure may follow in a similar manner as https://en.wikipedia.org/wiki/Bayesian_linear_regression#Conjugate_prior_distribution, however, I do not want to compute posterior mean and other things but I just simply want to prove the conjugate prior for this regression model.
Any guidance would greatly be helpful for me and I apologise if this question has been asked in the past.
Thank you in advance.