How/why does matrix multiplication work to do a linear fit?

438 Views Asked by At

Some background: I have a B.S. in physics. I have taken linear algebra. I do work that involves doing image analysis in IDL.

One thing I have to do a lot is fit a linear equation $(y=mx+b)$ to the number of electronic counts accumulated in each pixel over a series of frames. So, usually I'm working with something like an array that's $4096\times 4096\times 100$. Until now I've been just running a for loop where I input the hundred values for each pixel and a time vector into a linfit or regress function which spits out the linear fit. I then write the slope of that fit to a new $4096\times 4096 $ image which shows the rate each pixel accumulates counts/charge. It works, but a for loop that runs $16$ million times is slow in IDL.

Now, of course this is not a new problem and I found a nice procedure written by someone at the Space Telescope Science Institute that does the same thing as above but with really fast matrix multiplication. I've tested it and it gives the results I expect. Problem is that I don't understand how it works, and so can't modify it to fit my needs.

As far as I can tell the program is using this linear regression using matrices:

Linear Regression in Matrix Form

Unfortunately, I struggle with matrices; thus, I can't make any sense of what's happening.

There's the added difficulty that to do this in IDL the $4096\times 4096\times N$ array is turned into a $4096*4096\times N$ array (that is ~$16.7$million by $N$) and I'm not quite sure why.

So, can anyone explain what this matrix math is doing in extremely simple terms? For bonus points, avoid using terms like "null", "kernel", "space", and try not to just post a long series of formulas as my eyes will probably just glaze over. As a preemptive against those who might tell me that I need to study more Linear Algebra: I have and I aced the undergrad class. I have no problem with the mechanics of it. I just don't understand the underlying motivation and concepts of linear algebra.

Here is a link to the code: ramp_basiclinfit.pro

2

There are 2 best solutions below

1
On

I know exactly what you need. Here it is the answer to your problems: https://www.youtube.com/watch?v=dkfY0OKH12g

0
On

The answer is given on page 4, "Parameter Estimation - Least Squares".

You want to solve a system of linear equations, which can be written in matrix form as $$ X u = y \quad (*) $$ This is a compact notation for $m$ rows of linear equation $\sum_j x_{ij} u_j = y_i$ in $n$ variables $u_j$. It can be interpreted as a linear transformation $X$ acting on a vector $u$, mapping it to the vector $y$.

In your case the equations of the linear regression problem can be grouped such that they form a matrix equation like equation $(*)$. The vector of unknowns $u$ would hold the coefficients of the solution of the linear regression. Solving that system for $u$ would give the coefficients for a perfect fit.

It can happen that an exact solution $u$ is not possible for various reasons, but an approximative solution vector $\beta$ is considered which results in an image vector $X \beta = \hat{y}$ which might be different from the goal vector $y$.

So we have a difference $$ \epsilon = y - \hat{y} = y - X \beta $$ which we want to have as small as possible.

Thus one minimizes the expression \begin{align} d(\beta) &= \lVert \epsilon \rVert_2^2 \\ &= \sum_i \epsilon_i^2 \\ &= \sum_i (y - X \beta)_i^2 \\ &= (y - X \beta)^T (y - X \beta) \end{align} which is the squared length of the error vector in the $2$-norm, which depends on the choice of $\beta$. This form is what the term "least squares" refers to.

For a local extremum of $d(\beta)$, the gradient $\partial d / \partial \beta$ has to vanish (the equivalent to $f'(x) = 0$ in one dimension). And this happens if $\beta$ is a solution of another linear system that we get by multiplying both sides of the matrix equation $(*)$ from the left with the transposed matrix $X^T$: $$ X^T X \beta = X^T y $$ This new system has the matrix $A = X^T X$, a square $n \times n$ matrix, and the image vector $b = X^T y$. The really nice property of that new system $$ A \beta = b $$ is that $A = X^T X$ is invertible. There are algorithms to derive the inverse matrix $A^{-1}$ for a given matrix $A$. So we can solve for $\beta$: $$ \beta = A^{-1} b = (X^T X)^{-1} X^T y $$