Using overdetermined linear system of equations and least squares approximation for parameter estimation

34 Views Asked by At

I have data of reaction times $R$ for different conditions e.g. different people, different lightning and other variables. For simplicity assume the reaction time only depends on the person and the lightning. Let $R_{ij}$ be the reaction time for person $i$ subject to light $j$. Let $P_i$ be the persons "true" reaction time in seconds and let $L_j$ be the reaction time contribution of light $j$ (either positive or negative) in seconds. Assume $n$ people, $k$ lights and $m$ equations / relations / data samples, where $m > n+k$ (system overdetermined)

I would have a model of $$R_{ij} = P_i + L_j$$
I can write it on matrix form as $$R = Ax$$ where $A$ is a coefficient matrix (ones and zeros) and $x$ is a vector with all parameters stacked $$(P_1, P_2, ..., P_n, L_1, L_2, ... , L_k )^T$$ I would like to estimate the parameters in $x$ preferably with corresponding errors / standard deviations / confidence intervals so that I can say things like persons $5$ "true" reaction time is $x$ or light $3$ contributes $y$ milliseconds to the reaction time. I can solve this system using least squares, i.e. an $x$ that approximately solves the system. The problem is that when I do this with e.g. numpy.linalg.lstsq I get parameters that have crazy values but still solve the system approximately. For example I could have $P_i = 125345542.931235$ and $L_j = -125345542.531235$ but adding them still gives reasonable reaction time $0.4$. It still solves the problem, but obviously the parameters do not have the interpretation I want them to since I know them to be way smaller.

How can I fix this or do it another way?