Expected value of cross tabulation / contigency table

173 Views Asked by At

I’ve got a pretty interesting problem and I can’t figure out how to address it. I can’t even find a similar problem on the internet.

I have a cross tabulation / contigency table like this:

72251100 72255090 72259200
ITAPOLIS 828,339
PIRACICABA 1,543,919
BIRIGUI 536,795
1,365,134 1,175,953 367,966

I know the margins, but I don’t know its cells.

Besides that,I know that a few cells have the value equal to zero:

72251100 72255090 72259200
ITAPOLIS 0 828,339
PIRACICABA 1,543,919
BIRIGUI 0 536,795
1,365,134 1,175,953 367,966

The questions is:

How do I calculate the expected value (or weight mean) of this contigency table given I have two zeros ??

1

There are 1 best solutions below

1
On

The algorithm that you want is called Iterative Proportional Fitting.

You'll probably want to initialize it using the same value for each of the unknown elements, e.g. $$\eqalign{ &A_0 = \pmatrix{1&0&1\\1&1&1\\1&1&0} }$$ The other inputs are the marginal totals:
$\qquad$ one vector $(x)$ for the row totals and another $(y)$ for the column totals

After about a dozen iterations you'll arrive at $$A_{12} = \pmatrix{628,650 & 0 & 199,689 \\ 529,763 & 845,879 & 168,277 \\ 206,721 & 330,074 & 0 \\}$$ Changing $A_0$ will generate a different solution, but the above is the max entropy solution.


Here is the pseudo-code for the algorithm

do
   B  =  A .* (x*u') ./ (A*u*u')
   A  =  B .* (u*y') ./ (u*u'*B)
until norm(A-B) < tolerance

where x,y are the target row/column sum vectors and u is the all-ones vector. The operators .* and ./ denote elementwise multiplication/division, * is matrix multiplication, and an apostrophe x' denotes the transpose of x

Note that any zeros in the initial $A$ matrix will propagate through to the solution matrix, because of those elementwise multiplications.