Linear Estimation by Kailath, Sayed and Hassibi (Section 3.2.1) provides a derivation that the linear least mean squares estimator (LLMSE) $\hat{y}=K_0x$ satisfy the equation $K_0 R_y = R_{xy}$ where $x$ and $y$ are complex vector-valued zero-mean random variables (column vectors), $*$ denotes the complex conjugate transpose, and $R_x=E [xx^*]$ and $R_{xy}=E [xy^*]$ are the covariance matrices. The proof simply says to differentiate with respect to $aK$ (where $a$ is a row vector): $$a E[(x-Ky)(x-Ky)^*]a^* = a(R_x - R_{xy} K^* - KR_{yx} + KR_yK^*)a^*$$ and set to zero at $K=K_0$ to arrive at $K_0 R_y = R_{xy}$.
The simplified scalar case is clear. Unfortunatly, I don't know how do this vector differentiation? For example, what is $\frac{\partial}{\partial (aK)}$ of $aR_{xy}K^*a^*$? Any help is greatly appreciated.
I think there is some misunderstanding. There is another book from one of the authors of the book that you have mentioned: A. H. Sayed, Adaptive Filters, Wiley, NJ, 2008. Perhaps it is easier to read than the book you have mentioned. I am sure there are many books/notes for the derivation of LMMSE.
Anyway, the optimization problem for LMMSE weight should read as following \begin{align} \text{minimize}_{K} \operatorname{Tr}(R_x - R_{xy} K^H - KR_{yx} + KR_yK^H), \end{align} where $(\cdot)^H$ denotes complex conjugate transpose, whereas $(\cdot)^*$ corresponds to complex conjugate.
To this end, we need to find an optimal weight matrix $K$. We have to use Wirtinger calculus for the gradient computation, i.e., treat $K$ and $K^*$ independently.
Before finding the gradient (and setting it to zero), here are some nomenclature:
Let $f := \operatorname{Tr}(R_x - R_{xy} K^H - KR_{yx} + KR_yK^H) \equiv I: \operatorname{Tr}(R_x - R_{xy} K^H - KR_{yx} + KR_yK^H)$.
For simplicity, we will find the differential $dK^*$ (and ignore $dK$). \begin{align} df &= d\left(I:R_x - R_{xy} K^H - KR_{yx} + KR_yK^H \right) \\ &= \left(I:0 - R_{xy} dK^H - 0 + KR_ydK^H \right) \\ &= \left(I^T: - dK^*R_{xy}^T + dK^* (KR_y)^T \right) \\ &= \left(\left[-R_{xy} + KR_y \right]: dK^* \right) \end{align}
Thus, the gradient is \begin{align} \frac{\partial f}{\partial K^*} = -R_{xy} + KR_y. \end{align}
Now, you set the gradient to zero and obtain $K$ (assuming that $R_y$ is invertible and set $K = K_0$):
$$K_0 = R_{xy} R_y^{-1} .$$