Implementing value constraints on iterative proportional fitting procedure

58 Views Asked by At

I have recently been working on some population modelling projects, one of which requires trying to reconstruct population matrices $A = \{a_{ij}\}$, where entries $a_{ij}$ represent counts of individuals belonging to population groups $i$ and $j$ (for instance $i$ might index over a partition of age groups and $j$ might index over a partition of salary ranges).

These problems are usually given with a pair of "margin totals", $\textbf{v} = \{v_i\}$, $\textbf{w} = \{w_j\}$ which are row sums and column sums:

$$v_i = \sum_{j} a_{ij} \hspace{0.2in} \text{and} \hspace{0.2in} w_j = \sum_{i} a_{ij}$$

See for an example. Methods such as IPFP can then be used to iteratively generate a matrix $M = \{m_{ij}\}$ (using a "seeding" matrix) such that the margin constraints are satisfied:

$$v_i = \sum_{j} m_{ij} \hspace{0.2in} \text{and} \hspace{0.2in} w_j = \sum_{i} m_{ij}$$

I am looking for suggestions of literature or lines of research which might permit more complex contraints being considered, in addition to the margin contraints, throughout the fitting procedure.

For instance, assume we are trying to find/approximate the (hidden) matrix $A = \{a_{ij}\}$ with given margin constraints, where values represent counts of individuals of age group $i$ earning in the salary range $j$. If we know further information, for example that the distribution of salaries depends on age by some relation, can this be incorporated into the iterative procedure such that the columns of fitted matrix $M= \{m_{ij}\}$ abide by this relation while also satisfying margin constraints?

My initial thoughts were that this might be expressible using Langrange multipliers, but on closer inspection I am not sure the iterative process is linearly representable, not to mention the difficulty in expressing the Lagrange multipliers so that they remain meaningful through an itertive process.

Beyond this, I niavely feel that only constraints that can be expressed so as to simplify the initial problem could be considered; this in of itself being very "hacky". For instance, returning to our above example, if we know that the relation between age and salary is completely linear according to the groupings, so that the probability that an indivdual in age group $i$ earns salary $j$ is given by:

$$P(\text{age}(x) = i, \text{salary}(x) = j) = \delta_{ij}$$

where $d_{ij}$ is the Kronecker delta, this reduces the probelm to finding a diagonal matrix, which given the constraints (so long as that are "reasonable", in this case meaning $v_i = w_i$) is straight forward. But for even slightly more complex restraints, this becomes vastly more difficult.

Any suggestions for existing literature related to this kind of problem, or avenues of research which might bear fruit in finding methods to deal with such problems would be greatly appreciated.