The linear ridge regression loss function: $$ J(\beta)=\Sigma_{i=1}^n(x_i^T\beta-y_i)^2+\lambda\Sigma_{j=1}^p\beta_j^2= \Vert X\beta-Y \Vert^2 + \lambda\beta^TI\beta \text{ (matrix form)} $$ where $x_i$'s are the input vectors, $y_i$'s are the outputs (observations), $\beta$ is the vector of coefficients, and $\beta_j$'s are the elemenents of $\beta$, has the solution: $$ \hat{\beta}=(X^TX+\lambda I)^{-1}X^TY $$
On the other hand, in my textbook, it is said that by setting the derivative of $J(\beta)$ to $0$, we can obtain the solution $\hat{\beta}$ of the form: $$ \hat{\beta}=\Sigma_{i=1}^n \alpha_ix_i \tag{*} $$ where: $$ \alpha_i=\frac{-1}{\lambda}(x_i^T\beta-y_i) $$
How do we obtain (*)?
Since $J(\beta) = \frac{1}{2}\sum_{i=1}^n(x_i^T\beta-y_i)^2 + \frac{1}{2}\lambda\beta^T\beta$, we have $\nabla J(\beta)=\sum_{i=1}^n(x_i^T\beta-y)x_i + \lambda \beta$. Thus setting $\nabla J(\beta)$ to zero gives $\sum_{i=1}^n(x_i^T\beta-y)x_i + \lambda \beta = 0$, that is $\beta = -\frac{1}{\lambda}\sum_{i=1}^n(x_i^T\beta-y)x_i$, a fixed-point equation for the coefficients $\beta$.