In Simon Haykin's Neural networks and learning machines Page 106.
This paper defines a function and intends to derive it to obtain the target result w. $$ \mathscr E(\mathbf w) = \frac{\sum_i^N(d_i - \mathbf w^T \mathbf x_i)^2}{2} + \lambda \frac{\Vert \mathbf w \Vert^2}{2} $$ Finally, w can be written in the following form: $$ \begin{aligned} \hat {\mathbf w}_{\mathrm {MAP}} (N) & = [R_{xx} (N) + \lambda \mathbf I]^{-1}\ r_{dx} (N) \\\\ \hat R_{xx} (N) & = - \sum_{i=1}^N \sum_{j=1}^N \mathbf x_i \mathbf x_j^T ,\quad \mathbf x_i \mathbf x_j\ is\ a\ outer\ product\\\\ \hat r_{dx} (N) & = - \sum_{j=1}^N \mathbf x_i d_i \end{aligned} \tag{1} $$
But in my personal derivation: $$ \begin{aligned} \mathscr E (\mathbf w) & = \frac{1}{2} \sum_{i=1}^N (d_i^2 - 2 d_i \mathbf w^T \mathbf x_i + \mathbf w^T \mathbf x_i \mathbf x_i^T \mathbf w) + \frac{\lambda}{2} \mathbf w^T \mathbf w \\\\ \Rightarrow \mathscr E^{'} (\mathbf w) & = \frac{1}{2} \sum_{i=1}^N (-2 d_i \mathbf x_i + 2 \mathbf x_i \mathbf x_i^T \mathbf w) + \lambda \mathbf w \\\\ & = - \sum_{i=1}^N d_i \mathbf x_i + \sum_{i=1}^N \mathbf x_i \mathbf x_i^T \mathbf w + \lambda \mathbf w \\\\ & = - \sum_{i=1}^N d_i \mathbf x_i + (\sum_{i=1}^N \mathbf x_i \mathbf x_i^T + \lambda I) \mathbf w \end{aligned} $$
Then let $\mathscr E^{'} (\mathbf w)$ be 0: $$ \hat {\mathbf w}_{\mathrm {MAP}} (N) = [\sum_{i=1}^N \mathbf x_i \mathbf x_i^T + \lambda I]^{-1} \sum_{i=1}^N d_i \mathbf x_i \tag{2} $$
Are (1) and (2) equal? Or did my derivation go wrong?
Update:
This formula is similar to Wiener-Hopf equation
Update 2:
In "Matrix formulation of the Wiener–Hopf Equations", section 2.4 from "Adaptive Filter Theory"
Wiener solution can be expressed as
$$ \mathbf w_0 = \mathbf R^{-1} \mathbf p,\quad \mathbf R = \mathbb E[\mathbf u \mathbf u^T] $$
It more like the form I wrote.