How do I evaluate this derivative in a well-known maximum likelihood derivation?

112 Views Asked by At

I have been trying for some considerable time to derive a very widely used result in the field of (wireless) direction-finding, namely the Maximum Likelihood direction estimate (based on a simple, standard signal model).

After much searching, the best references I could find were this review article (behind a paywall) and, to a lesser extent, this one (free). However, I still get stuck on one of the differentiations. This is possibly so simple that no background is required, but I'll show my working:


We have a matrix $\boldsymbol{X}\in\mathbb{C}^{p\times M}$ comprising $M$ discrete measurements from an array of $p$ antennas. According to the signal model, $\boldsymbol{X}$ has the following structure: $$\boldsymbol{X}=\boldsymbol{A}\left(\underline{\theta}\right)\boldsymbol{S}+\boldsymbol{N}\tag{1}$$ where $\boldsymbol{A}\left(\underline{\theta}\right)=[\underline{a}(\theta_1), \dots,\underline{a}(\theta_q)]$ and the value of the "steering vector" $\underline{a}(\theta)\in\mathbb{C^{p\times1}}$ is known for any $\theta\in\mathbb{R}$, but $\underline{\theta}=[\theta_1,\dots,\theta_q]^T$ is unknown (to be estimated). The matrix $\boldsymbol{S}\in\mathbb{C}^{q\times M}$ is deterministic and unknown (and we don't care about finding out its values). The random noise $\boldsymbol{N}$ has various nice properties to simplify the analysis (see assumptions below):

Assumptions

Based on this model, the joint PDF of the columns of $\boldsymbol{X}$ is apparently:$$f(\boldsymbol{X})=\prod_{i=1}^M\frac{1}{\pi \det(\sigma^2\boldsymbol{I})}\exp\left(-\frac{1}{\sigma^2}\lVert \underline{x}(t_i)-\boldsymbol{A}\left(\underline{\theta}\right)\underline{s}(t_i)\rVert^2\right) \tag{2}$$

(However, based on other reading, I can't see why the $\pi$ isn't $\pi^p$). Nonetheless, the log-likelihood function (ignoring constant terms) is:$$\mathcal{L}=-Mp\ln\sigma^2-\frac{1}{\sigma^2}\sum_{i=1}^M \lVert \underline{x}(t_i)-\boldsymbol{A}\left(\underline{\theta}\right)\underline{s}(t_i) \rVert^2 \tag{3}$$

Fixing $\underline{\theta}$ and $\underline{s}$ and maximizing with respect to $\sigma^2$: $$ \hat{\sigma}^2=\frac{1}{Mp}\sum_{i=1}^M \lVert \underline{x}(t_i)-\boldsymbol{A}\left(\underline{\theta}\right)\underline{s}(t_i) \rVert^2 \tag{4}$$ Substituting this back in to (3), we get (ignoring constant terms, and the monotonic $\ln(\cdot)$): $$ -\sum_{i=1}^M \lVert \underline{x}(t_i)-\boldsymbol{A}\left(\underline{\theta}\right)\underline{s}(t_i) \rVert^2 \tag{5}$$


The next step is to maximize this expression (5) with respect to $\underline{s}$. I'm a bit confused as to what this even means, when (5) contains $\underline{s}(t_i)$, not $\underline{s}$. The only thing I could think is if we could assume the $\left\{\underline{s}(t_i)\right\}$ are independent, then differentiating with respect to some $\underline{s} \triangleq \underline{s}(t_j)$ would, I think, give: $$ \frac{\partial}{\partial \underline{s}(t_j)}\underline{s}(t_i)=\left\{ \begin{array}{c} \mathbf{I},\quad i=j \\ \mathbf{0},\quad i\neq j% \end{array}% \right. $$ and then the $\sum_{i=1}^M$ would basically disappear. However, I'm not sure if this is correct, or even if the assumption of independence makes any sense.

The reference provided for solving this step is this paper (good typesetting, but behind a paywall) or this paper (the same paper for free, but worse typesetting). However, it is written in mathematical language that I can't make any sense of.

Can anyone help me to get past this step?