I try to reproduce the analysis of some experimental data but need some help there.
I have a $4\times 4$ table $n_{k,l}$, indicating how often a certain combination occurred. The total amount of occurrences (measurements) is roughly $N=10^6$.
To check for independence (using a Pearson $\chi ^2$ test), the first task is to find the probability, for a certain $l$ to be in place. I expected that to be straight forward $p(l=i) = \sum_k n_{ki}/N$. But that seems not to be the right answer. My understanding was, this is because I must not assume independency because I am using an assumption I want to investigate later.
It notes that a least squares estimate has been performed with the following reasoning:
The $n_l$, could seen as $n_{m,n}: m,n \in \{1,2\}$, meaning the actual entries are $n_l\in\{11,12,21,22\}$. Also, there are the constraints $\sum_m p_{mn} = \sum_n p_{mn} =1$. According to the author, this leads to an overdetermined system (4 equations, 2 variables) making the least squares estimate necessary.
My questions are:
- What exactly is this overdetermined system? This may sound trivial and probably arises due to my ignorance of the procedure the author wanted to apply if he hadn't faced this problem.
- Would there be a difference without these constraints?
- How do I perform this least squares estimate to get the $p_l$?
I know how to do a least squares fit for a system $y=Ax$, but I really cannot see the analogy between those two problems.
Thank you very much for your help. I hope I could make my problems clear enough.
I would already be happy with some hints and terms I should search for.