Given a system $$y = h x + n$$
where $x$ is unknown to estimate, $y$ is observed data, $h$ is known, $n \sim \mathcal{N}(0,\sigma^2)$. They are complex numbers.
I am trying to prove the linear minimum mean square error (MMSE) estimator that $$w = \frac{h^*}{h^*h+\sigma^2} \tag{1}$$ will minimize the mean square error $\mathrm{MSE}=\mathbb{E}[(wy-x)^*(wy-x)]$ where $\mathbb{E}$ is the expectance operator, $^*$ is complex conjugate operator.
I have difficulty in doing partial derivation $\frac{\partial \mathrm{MSE}}{\partial w}$ and I wonder if this MSE is analytic by $w$.
Indeed,
\begin{align} \frac{\partial \mathrm{MSE}}{\partial w} &= \mathbb{E}\left[\lim_{\Delta \to 0} \frac{((w+\Delta)y-x)^*((w+\Delta)y-x) - (wy-x)^*(wy-x)}{\Delta}\right]\\ &=\mathbb{E}\left[\lim_{\Delta \to 0} \frac{w^*\Delta^*(wy-x) + (wy-x)^*w\Delta+w^*\Delta^*w\Delta}{\Delta}\right]\\ &=\mathbb{E}\left[\lim_{\Delta \to 0} w^*(wy-x)\frac{\Delta^*}{\Delta} + (wy-x)^*w+ w^*\Delta^*w\right] \end{align}
The function $f(z)=z^*$ is not analytic because if $z = x + jy$ then $f(z)=x-jy$ and by calling $\Delta z = \Delta x + j \Delta y$ $$f'=\lim_{\Delta z \to 0} \frac{f(z + \Delta z) - f(z)}{\Delta z} = \lim_{\Delta x \to 0; \Delta y \to 0} \frac{\Delta x - j \Delta y}{\Delta x + j \Delta y}$$
Let $\Delta x = 0$ and $\Delta y \to 0$, $f'_x = -1$
Let $\Delta y = 0$ and $\Delta x \to 0$, $f'_y = +1 \neq f'_x$ thus we don't have derivation of $f(z)$.
Could anyone please tell me where I was wrong in trying to prove $(1)$ ?
From a real point of view, you want to minimise a real-valued function on $\mathbb{R}^2$. Thus to find critical points, you need two real partial derivatives vanishing - if we write $w = u + iv$, then you need $\frac{\partial \operatorname{MSE}}{\partial u} = 0$ and $\frac{\partial \operatorname{MSE}}{\partial v} = 0$. In the form $\operatorname{MSE}$ is given, it is however more convenient to use the Wirtinger derivatives
$$\frac{\partial}{\partial w} = \frac{1}{2}\biggl( \frac{\partial}{\partial u} - i\frac{\partial}{\partial v}\biggr) \qquad\text{and}\qquad \frac{\partial}{\partial w^{\ast}} = \frac{1}{2}\biggl( \frac{\partial}{\partial u} + i\frac{\partial}{\partial v}\biggr)$$
to locate the critical points. The critical points of a real-differentiable function on $\mathbb{C}$ are precisely the points where both Wirtinger derivatives vanish. For real-valued functions it does however suffice to consider only one of the Wirtinger derivatives, because the identity
$$\frac{\partial g}{\partial w^{\ast}} = \biggl(\frac{\partial g^{\ast}}{\partial w}\biggr)^{\ast}$$
becomes
$$\frac{\partial f}{\partial w^{\ast}} = \biggl(\frac{\partial f}{\partial w}\biggr)^{\ast}$$
for real-valued $f$, and so the two Wirtinger derivatives vanish at the same points.
Aside: the Wirtinger derivatives $\frac{\partial}{\partial w}$ and $\frac{\partial}{\partial w^{\ast}}$ are not partial derivatives, though they do formally behave in that way, except that we cannot obtain them as a limit of difference quotients in general, only at points where one of the two vanishes.
Since $\operatorname{MSE}$ is real-valued and (typically) not constant, it's not an analytic function, so we can't use difference quotients to determine $\frac{\partial \operatorname{MSE}}{\partial w}$.
To compute the Wirtinger derivatives here, we note that
$$\frac{\partial w}{\partial w} = 1 = \frac{\partial w^{\ast}}{\partial w^{\ast}} \qquad\text{and}\qquad \frac{\partial w^{\ast}}{\partial w} = 0 = \frac{\partial w}{\partial w^{\ast}}.$$
Using the linearity of expectations, we expand
$$\mathbb{E}[(wy-x)^{\ast}(wy-x)] = w^{\ast} w\mathbb{E}[y^{\ast} y] - w^{\ast} \mathbb{E}[y^{\ast} x] - w\mathbb{E}[x^{\ast} y] + \mathbb{E}[x^{\ast} x]$$
and thus find
$$\frac{\partial\operatorname{MSE}}{\partial w^{\ast}} = w\mathbb{E}[y^{\ast} y] - \mathbb{E}[y^{\ast} x],$$
so the critical point is
$$w = \frac{\mathbb{E}[y^{\ast} x]}{\mathbb{E}[y^{\ast} y]}$$
(unless $\mathbb{E}[y^{\ast} y] = 0$ of course, in that case $\operatorname{MSE}$ is constant).
It remains to evaluate the two expectations.
$$\mathbb{E}[y^{\ast} y] = h^{\ast} h \mathbb{E}[x^{\ast} x] + h^{\ast} \mathbb{E}[x^{\ast} n] + h \mathbb{E}[n x] + \mathbb{E}[n^2]$$
and
$$\mathbb{E}[y^{\ast} x] = h^{\ast} \mathbb{E}[x^{\ast} x] + \mathbb{E}[nx].$$
If $x$ and $n$ are uncorrelated, these simplify to
$$\mathbb{E}[y^{\ast} y] = h^{\ast}h \mathbb{E}[x^{\ast} x] + \sigma^2 \qquad\text{and}\qquad \mathbb{E}[y^{\ast} x] = h^{\ast} \mathbb{E}[x^{\ast} x]$$
and the minimiser is
$$w = \frac{h^{\ast}}{h^{\ast} h + \frac{\sigma^2}{\mathbb{E}[x^{\ast} x]}}.$$
This is different from $(1)$ unless $\mathbb{E}[x^{\ast} x] = 1$.
If $x$ and $n$ are correlated, the expression for the minimiser is more complicated.