Reading in lecture notes on the topic of statistical signal processing, I found the following lines of formula that I do not understand:
$$(I)~~Ŷ[k] = \textbf{a}^{H}[k]\textbf X[k]$$ , where $\textbf a$ and $\textbf X$ are vectors of identical length. In the following the dependency upon k is dropped for brevity.
$$ \begin{equation} \begin{aligned} (II)~~J(a) & = \mathcal E\{|Y - Ŷ|^2\} \\ & = \mathcal E \{(Y - \textbf a^H \textbf X)(Y^* - \textbf X^H \textbf a) \} \\ & = ~~... \end{aligned} \end{equation} $$
$\mathcal E \{\cdot \}$ is the expected value operator which I included for completeness. Just "think it away", I guess it is not necessary for answering the question.
Question: How did the author derive the second line of (II) from the first line? Is there a binomial formula for matrices? Why is the second Y complex conjugated? Why is the first bracket of the second line of (II) not just repeated a second time after the first? Why are the vectors in the second bracket complex conjugated and transposed?
Background: J(a) is a cost function for a minimum mean square error estimator that must be minimized by picking the parameter vector $\textbf a$ of the estimator in a way that J(a) becomes as small as possible. Then the predicted signal Ŷ is as close as close as possible to the actual signal Y by measurement of mean square error.
Thanks for your support I hope someone is more experienced with this than me.
For real vectors, $x = \pmatrix{x_1 & \ldots x_n}$, compare $$ \| x\|^2 $$ with $$ x x^t $$ and you'll see that they're both $x_1^2 + \ldots + x_n^2$.
For complex vectors, you have to throw in a conjugate on the right-hand term, so that $$ \| z \| = z \bar{z}^t = z_1 \bar{z}_1 + \ldots + z_n \bar{z}_n. $$
That's the first thing you need.
The second is that for matrices, $$ (A + B)(C + D) = AC + AD + BC + BD $$ and in particular $$ (A + B )(A^t + B^t) = A A^t + A B^t + B A^t + B B^t $$ In the special case where $A$ and $B$ are both $1 \times n$ vectors, the two middle terms are equal.
That should let you work out what's going on in the derivation above.
I confess, in the stuff you typed, I think that the second $X$ and $a$ should also be complex conjugated, but perhaps $a$ is known to be real, and maybe $X^H$ denotes the complex conjugate transpose of $X$ for you.