When reading through my notes I read through this section showing Ridge Regression has a Bayesian interpretation. This came with the following proof:

Now, I was wondering, how would this proof change if the intercept was included? In my notes they mention that "For ease of notation, we omit the intercept. However, when an intercept is included an improper prior is placed on the intercept $w_{0}$ (i.e. $w_{0}$ has a uniform distribution on the entire real line)". But I am not entirely sure how to implement this.
Putting a prior on $w_0$ (and assuming $w_0$ is independent of $w_1, \ldots, w_N$) would amount to adding another term, name it $p(w_0)$, to the product on the RHS of your first expression. In particular, for a uniform (improper) prior over the reals, $p(w_0) \propto 1$, so $p$ does not depend on the value of $w_0$ and the minimization problem is in fact the same (you don't have a regularization term for $w_0$).