I’ve got a pretty interesting problem and I can’t figure out how to address it. I can’t even find a similar problem on the internet.
I have a cross tabulation / contigency table like this:
| 72251100 | 72255090 | 72259200 | ||
|---|---|---|---|---|
| ITAPOLIS | 828,339 | |||
| PIRACICABA | 1,543,919 | |||
| BIRIGUI | 536,795 | |||
| 1,365,134 | 1,175,953 | 367,966 |
I know the margins, but I don’t know its cells.
Besides that,I know that a few cells have the value equal to zero:
| 72251100 | 72255090 | 72259200 | ||
|---|---|---|---|---|
| ITAPOLIS | 0 | 828,339 | ||
| PIRACICABA | 1,543,919 | |||
| BIRIGUI | 0 | 536,795 | ||
| 1,365,134 | 1,175,953 | 367,966 |
The questions is:
How do I calculate the expected value (or weight mean) of this contigency table given I have two zeros ??
The algorithm that you want is called Iterative Proportional Fitting.
You'll probably want to initialize it using the same value for each of the unknown elements, e.g. $$\eqalign{ &A_0 = \pmatrix{1&0&1\\1&1&1\\1&1&0} }$$ The other inputs are the marginal totals:
$\qquad$ one vector $(x)$ for the row totals and another $(y)$ for the column totals
After about a dozen iterations you'll arrive at $$A_{12} = \pmatrix{628,650 & 0 & 199,689 \\ 529,763 & 845,879 & 168,277 \\ 206,721 & 330,074 & 0 \\}$$ Changing $A_0$ will generate a different solution, but the above is the max entropy solution.
Here is the pseudo-code for the algorithm
where
x,yare the target row/column sum vectors anduis the all-ones vector. The operators.*and./denote elementwise multiplication/division,*is matrix multiplication, and an apostrophex'denotes the transpose ofxNote that any zeros in the initial $A$ matrix will propagate through to the solution matrix, because of those elementwise multiplications.