wanted: a lucid demonstration that these $m+n$ bilinear equations have a solution

109 Views Asked by At

I shall first state the problem, and if it arouses any interest, I may gradually add a few notes, to provide a little context for those who, like myself, appreciate such background information. let me say once that all quantities are positive reals.

DATA an $m \times n$ matrix $X=\{x_{ij}\}$. an $m$-vector $P=\{p_{i}\}$ and an $n$-vector $Q=\{q_{j}\}$.

CONDITION $$\sum_{i=1}^m p_i = T = \sum_{j=1}^n q_j $$

PROBLEM

(a) find $m+n$ numbers $\lambda_i$ and $\rho_j$ satisfying: $$ \lambda_i\sum_{j=1}^n \rho_j x_{ij} = p_i \\ \rho_j \sum_{i=1}^m \lambda_ix_{ij} = q_j \\ $$

(b) show, in a lucid and clear manner, that under the stated conditions a solution always exists, and that for a given $X,P,Q$ it is unique apart from a multiplicative constant - obviously if ($\lambda, \rho$) is a solution, so is ($\alpha^{-1}\lambda, \alpha \rho$).

NOTES

this problem is of considerable practical importance - for example in certain areas of study connected with spatial diffusion in social systems.

it has a simple and elegant algorithmic solution which was perhaps discovered more by accident than design.

the convergence of the algorithm and the essential uniqueness of the solution have been demonstrated, but the only such proof I have seen runs to several pages of rather formidable algebraic reasoning. whilst I greatly admire those with the fortitude to engage successfully in such a hammer-and-chisel approach, the convergence of the algorithm is so impressive that one feels there may be some possibility of a more elegant and lucid treatment.

some years ago I heard a rumour that the result has been treated as a problem in pure mathematics, perhaps in the early 1960's. I have not been able to chase up a reference, however, and would be very grateful if anyone can provide a link to such work.

the desire to provide a proper setting for this question has been one of the chief motives for my wishing to study mathematics more seriously. MSE is a great learning environment and I am grateful to many people who have set me right about certain false assumptions. it is a steep learning curve, but I have already made considerable progress with this formal side of things, which, even in the absence of the required solution, is aesthetically satisfying. I have been led to the study of certain finite-dimensional Banach algebras which do seem the natural setting for what, in the end, is a fixed-point theorem.

there are obvious possibilities to look towards generalizations.

for example one could look at the infinite-dimensional case - say in a sequence space. it is possible that the same problem has been treated in the context of integral equations, and that some more general theorem of this type exists, so that the finite-dimensional case would be a simple corollary.

I noticed many years ago, when I was exploring the issue via a computer implementation, that convergence could be obtained in some cases when the initial matrix X had complex entries. however, without an appropriate theoretical framework, I thought it better to leave such exploration aside until I had a better grasp of the basic situation.

perhaps the most interesting generalization is to the case where we have three or more marginal constraints.

in the hope of possibly finding an applicable general theorem which can be used to obtain the desired result as a corollary, I have tentatively reformulated the problem as a differential equation here, but so far have had no feedback at all on this from the MSE community.

I have described in brief my evolving terminology here, in the context of a serious question about the norm restriction on multiplication in a banach algebra. again, no useful feedback has as yet been forthcoming.

1

There are 1 best solutions below

5
On BEST ANSWER

Since $p$ and $\rho$ are so similar typographically, I took the liberty of using $\mu_j$ instead of $\rho_j$.

For brevity, let $C^n$ denote the (closed) non-negative orthant in $\mathbf{R}^n$ with the origin removed, $C_+^n$ the (open) positive orthant (a.k.a. the interior of $C^n$), $\Delta^n$ the (closed) $(n - 1)$-simplex $$ \{(x_j) \text{ in }\mathbf{R}^n: \text{$x_j \geq 0$ for all $i$, and $\sum_j x_j = T$}\}, $$ and $\Delta_+^n = \Delta^n \cap C_+^n$ the interior.

For each $\mu$ in $C^n$, define $\lambda$ in $C_+^m$ by $$ \lambda_i = \frac{p_i}{\sum_{k} x_{ik} \mu_k}. $$ Since the $x_{ik}$ are all positive, and $\mu_k \geq 0$ for all $k$, and at least one $\mu_k$ is is non-zero, the denominator is positive for each $i$. Since $p_i > 0$ for all $i$, $\lambda_i > 0$ for all $i$. In words, for each $\mu$ in $C^n$, there exists a unique $\lambda$ in $C_+^m$ satisfying the first set of equations, $$ \lambda_i \sum_{j=1}^n x_{ij} \mu_j = p_i, $$ and the dependence is continuous in $\mu$.

Define a rational map $F:C^n \to C^n$ by taking the $j$th component of $F(\mu)$ to be $$ F(\mu)_j = \mu_j \sum_{i=1}^m \frac{x_{ij} p_i}{\sum_k x_{ik} \mu_k}. $$ By the preceding observations, $F$ is continuous. Moreover, (i) $F$ is constant along rays through the origin (i.e., $F(t\mu) = F(\mu)$ for all $t > 0$); (ii) $F$ maps each face of $C^n$ into itself (i.e., the hyperplane $\{\mu_j = 0\}$ maps into itself) and similarly for intersections of faces; and (iii) $F$ maps $C_+^n$ to itself (if $\mu_j > 0$ for all $j$, then $F(\mu)_j > 0$ for all $j$).

Both sets of equations $$ \lambda_i \sum_{j=1}^n x_{ij} \mu_j = p_i,\qquad \mu_j \sum_{i=1}^m x_{ij} \lambda_i = q_j, $$ are solved if there exists a $\mu$ in $C_+^n$ such that $q = F(\mu)$.

Existence: By following $F$ with a radial scaling, we obtain a continuous map $f:\Delta^n \to \Delta^n$. Explicitly, if $F(\mu) = y = (y_k)$, then $$ f(\mu) = \frac{T}{\sum_{k} y_k}\, y. $$ In particular, $f(\mu) = F(\mu)$ if and only if $\sum_k y_k = T$, if and only if $y \in \Delta^n$.

Since $F$ maps each intersection of faces of $C^n$ to itself, $f$ maps each sub-simplex of $\Delta^n$ to itself. By this MO post, $f$ is surjective. But $q \in \Delta^n$, so $f(\mu) = q$ for some $\mu$ in $\Delta^n$. The preceding argument shows $F(\mu) = q$ as well.

Uniqueness (up to scaling): For each $j$, the $j$th component function of $F$ may be written $$ F(\mu)_j = \sum_{i=1}^m \frac{x_{ij} p_i}{\sum_k x_{ik} (\mu_k/\mu_j)}. $$ The right-hand side is strictly decreasing as a function of the $(n - 1)$ ratios $\mu_k/\mu_j$ with $k \neq j$. Consequently, if $F(\mu) = F(\mu')$, then for each $j$, we have $\mu_k/\mu_j = \mu'_k/\mu'_j$ for all $k$. It follows that $\mu$ and $\mu'$ are proportional as vectors in $C^n$.