how to apply non-linear least square

196 Views Asked by At

I'm trying to implement the example of estimating an angle between a target $\textbf{x}$ and a sensor $x_{p}$. I'm using the example in this book. There are three available measurements of the angle in radian as follows

$$ z = \begin{bmatrix} z(1) \\ z(2) \\ z(3) \end{bmatrix} = \textbf{h}(\textbf{x}, x_{p}) + w $$

where $\textbf{x} = [x \ y]$ is the position of the target, $x_{p} = [x_{px} \ x_{py}]$ is the position of the sensor and $w$ is the noise. The actual function of this sensor is

$$ \textbf{h}(\textbf{x}, x_{p}) = tan^{-1} \frac{y - x_{py}}{x - x_{px}} $$

The linearized version of $\textbf{h}(.)$ is provided through Jacobian $J$ as follows

$$ J = \frac{\partial h(\textbf{x},x_{p})}{\partial \textbf{x}} = \begin{bmatrix} \frac{\partial h(\textbf{x},x_{p_{1}})}{\partial x} & \frac{\partial h(\textbf{x},x_{p_{1}})}{\partial y} \\ \frac{\partial h(\textbf{x},x_{p_{2}})}{\partial x} & \frac{\partial h(\textbf{x},x_{p_{2}})}{\partial y} \\ \frac{\partial h(\textbf{x},x_{p_{3}})}{\partial x} & \frac{\partial h(\textbf{x},x_{p_{3}})}{\partial y} \\ \end{bmatrix} $$

assuming the variance of the noise is same, the formula of Least Square is

$$ \hat{x} = (J^{T}J)^{-1} J^{T} z $$

In my simulation,

z =
    0.7548
    0.9402
    2.2379
J =
   -0.0621    0.0552
   -0.0552    0.0621
    0.2500    0.2500
xbar =  <--- wrong
   -2.7493
   11.7078

As you can see $\hat{x}$ is not only wrong, but its dimension is weird. I'm expecting it to be 3x1 but I'm getting 2x1


Edited: The following picture illustrates the problem.

enter image description here

1

There are 1 best solutions below

0
On

Firstly, let us refresh some concepts of LS (Least Squares) estimation. Let $K$ be a positive integer and let $\mathbf{1}_K = [1 \ 1 \ \cdots \ 1]^T \in \mathbb{R}^{K \times 1}$ be an all-ones vector ($K$-dimensional), where $^T$ denotes transpose. Consider a single sensor doing $K$ (noisy) obversations of the same angle, $\phi$. Then, denoting with $\mathbf{y} \in \mathbb{R}^{K \times 1}$ the observables of the true angle $\phi$, you have that $$ \mathbf{y} = \phi \mathbf{1}_K + \mathbf{w}, \quad \mathbf{w} \sim (\mathbf{0}, \ \sigma^2 \mathbf{I}_K) $$ where $\mathbf{I}_K$ is the $K \times K$ identity matrix and $\mathbf{w} \in \mathbb{R}^{K \times 1}$ is a noisy term (for which we do not care about its specific distribution), but we do suppose that it is zero mean, uncorrelated and with the same variance. With the only exception of the zero-mean hypothesis, the other hypothesis, for LS, are not crucial (just for simplicity). The "best" (in the LS-sense) estimate of the angle $\phi$ is given by $$ (*) \quad \hat{\phi}_{LS} = (\mathbf{1}_K^T \mathbf{1}_K)^{-1} \mathbf{1}_K^T \mathbf{y} $$ Under the additional hypothesis that $\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}_K)$, the above $(*)$ is also the ML (Maximum Likelihood) estimate of $\phi$. Equation $(*)$ is the famous sample mean estimator, since it can be rewritten as $$ \hat{\phi} = \frac{1}{K} \sum_{k=1}^K y_k $$ where $y_k$ is the $k$-th component of vector $\mathbf{y}$. The sample mean can be used even if you have an unknown covariance matrix $\mathbf{C}$: only the zero-mean hypothesis (which is the typical scenario with AOA, Angle Of Arrival) has relevance (if the mean is nonzero, but known, you can subtract it and rewrite the problem with zero-mean noise; if the mean is nonzero and unknown, not much can be done, the results will be biased). If you have $N$ sensors, then the target position can be estimated using triangulation.

Now, this is not the case. The formula which uses the Jacobian matrix is actually estimating directly the target position: however, some things must be made clear.

In vector $\mathbf{z} \in \mathbb{R}^{N \times 1}$, in order for what you wrote to make sense, you should actually collect the $N$ estimated angles $\hat{\phi}_n$, with $n=1,\dots, N$. Each angle estimate is a sample mean done on $K$ observations of the same angle, i.e., $$ \mathbf{y}_n = \phi_n \mathbf{1}_K + \mathbf{w}_n, \quad n=1,\dots,N $$ and then, letting $\mathbf{z} \triangleq [\hat{\phi}_1 \ \hat{\phi_2} \cdots \hat{\phi}_N]^T \in \mathbb{R}^{N \times 1}$, it makes sense to say that $$ \hat{\mathbf{x}} = (\mathbf{J}^T \mathbf{J})^{-1} \mathbf{J}^T \mathbf{z} $$ where $\hat{\mathbf{x}} \in \mathbb{R}^{2 \times 1}$ is the estimated target position (and $\mathbf{J} \in \mathbb{R}^{N \times 2}$ the Jacobian matrix, obviously). Simple dimensional checking proves that $\hat{\mathbf{x}}$ is a $2 \times 1$ vector (since the problem is in 2D).

What about errors? Errors come from a combination of a variety of factors. First of all, noise variance, $\sigma^2$, is indeed a limiting factor on performances: the impact of $\sigma^2$ can be reduced by increasing $K$ (if possible, this is always a good idea). Secondly, the LS procedure is using only the first-derivative of the function (of the whole Taylor series) and thus it is an approximation. Certainly, increasing $N$ (i.e., more sensors), the perfomance will be (statistically) better, but approximations are still approximations: you may check "how much" your approximation is problematic by not generating noise at all, let the algorithm do its job and then plotting the errors (or the ECDF, Empirical Cumulative Distribution Function, or any other metric that suits your case, e.g. RMS error, with RMS being Root Mean Square).

The absolute limit of the performances of any estimator is determined by the Cramer-Rao Lower Bound (CRLB), which you may be able to calculate under standard assumptions. More information about the CRLB can be found in the literature.