How to set up matrix to compute best coefficients

59 Views Asked by At

Suppose we're given a non-linear spring with the following relationship between the applied weight ($x$) and displacement ($y$): $y = ax + bx^3$.

I've done a sequence of $m$ tests measuring the displacement corresponding to the weight in the form: ${(x_1, y_1), (x_2, y_2),...,(x_m, y_m)}$

The spring is additionally calibrated choosing coefficients $a$ and $b$ to minimize the sum of the squares of the errors: $ε_i = y_i - (ax_1 + bx_i^3), i = 1, 2,...,m$

I'm trying to set up a matrix system to compute the best coefficients $a$ and $b$ but I'm confused as to what this could be. Why does this matrix system always have a solution that minimizes the sum of the errors?

1

There are 1 best solutions below

5
On BEST ANSWER

Okay, your goal is to find $a, b$ to minimize the errors $\epsilon_i$, so that your model of your spring is as close to the real world as possible.

If you set up you matrix to be $$A = \begin{pmatrix} x_1 & x_1^3 \\ x_2 & x_2^3 \\ \vdots & \vdots \\ x_m & x_m^3 \end{pmatrix}$$ and multiply this with your unknown matrix $\begin{pmatrix} a \\ b \end{pmatrix}$, you get a vector of elements $ax_i + bx_i^3$, setting this equal to your $y$-vector, you get $ax_i + bx_i^3 = y_i$, now ideally, this equality would hold for all $i$, put probably will not. Thus you set $\epsilon_i = y_i - (ax_i + bx_i^3)$ and you want the sum of the squares $$\epsilon_1^2 + \epsilon_2^2 + \dots + \epsilon_m^2$$ to be as small as possible.

This can be translated to terms of linear algebra; you want to find a vector $x$ such that the norm $\|y - Ax\|^2 = \epsilon_1^2 + \dots + \epsilon_m^2$ is minimized. We can use some properties of the norm to get: $$\|y - Ax\| = (y-Ax)^T(y-Ax) = y^Ty - y^TA^Tx - x^TAy + x^TA^TAx$$ Note that $y^TA^Tx = x^TAy$ since they are both scalar and $(y^TAx)^T = x^TA^Ty$. Differentiate this w.r.t. $x$ and you get: $$-A^Ty + (A^TA)x = 0$$ which can be rewritten to $(A^TA)x = A^Ty$, which can be solved by regular means.

Another way to think about this is that we want to find the vector $x$ in the column space of $A$ that maps to a vector as close as possible to $y$. Decompose $y = w + e$, where $w$ is in the column space of $A$ and $e$ is orthogonal to the column space of $A$. Since $w$ is in the column space, there exists a vector $x$ such that $w = Ax$, and we want to know $x$. Since $e$ is orthogonal to the column space, the column space of $A$ is the row space of $A^T$, and the orthogonal complement to the row space is the nullspace, we get $A^Te = 0$. We can then calculate: $$0 = A^Te = A^T(y - w) = A^Ty - A^Tw = A^Ty - A^TAx$$ from which we can calculate $x$. Note that this is the same equation as before.

Of course, it is best to solve these kinds of problems using a suitable matrix decomposition, e.g. QR decomposition, where $A = QR$, $Q^TQ = I$ and $R$ is upper triangular. We then get: $$A^TA = (QR)^T(QR) = R^TQ^TQR = R^TR$$ and we get $R^TRx = R^TQ^Ty \Rightarrow Rx = Q^Ty$, which is really easy to solve since $R$ is triangular, instead of $(A^TA)x = A^Ty$.