What is the right way to find the derivative of F on $\beta_0$ and find $\beta_0$?
$ F = \frac{1}{2} \|X \beta + \mathbf{1}\beta_0 - y\|^2 + \tfrac{C}2\|\beta\|^2 $, where
$X$ is $(n_{samples} \times n_{features})$ matrix
$y$ is $(n_{samples} \times \ , )$ vector
1 is a column vector of ones
Here is my solution:
$ \frac{\partial F}{\partial \beta_0} = (X \beta + $1$ \beta_0 - y)$1$ = 0 $
1$^T (X \beta + $1$ \beta_0 - y) = 0 $
1$^TX \beta + $1$^T$1$\beta_0-$1$^Ty = 0$
1$^TX \beta + \beta_0-$1$^Ty = 0$
$\beta_0 = $1$^Ty - $1$^TX \beta $
$\beta_0 = $1$^T(y - X \beta )$
What you can do is to adapt this into a known form.
Define $ \hat{X} = \left[ \boldsymbol{1}, X \right] $ and $ \hat{\beta} = \left[ {\beta}_{0}, {\beta}^{T} \right]^{T} $.
Also define $ D $ which is the Identity Matrix with only $ {D}_{ii} = 0 $.
Then you problem can be written as:
$$ z = \arg \min_{\hat{\beta}} \frac{1}{2} \left\| \hat{X} \hat{\beta} - y \right\|_{2}^{2} + \frac{C}{2} \left\| D \hat{\beta} \right\|_{2}^{2} $$
Now this is easily solved by:
$$ z = \left( \hat{X}^{T} \hat{X} + c {D}^{T} D \right)^{-1} \hat{X}^{T} y $$.
Now $ {\beta}_{0} = {z}_{1} $ and $ \beta = \left[ {z}_{2}, {z}_{3}, \ldots \right] $.