How to deduce the psition mapping of entries of a matrix?

899 Views Asked by At

I would be thankful if any peer shed light on me. Assume that the mapping of a set is unknown. By knowing n number of E element sets and the transformed sets with positioned elements, How can I mathmatically deduce the position mapping? In other words, what is the relationship between n and E to find the position mapping.

Lets consider an example for the problem clarification. Suppose, a function maps 3 by 3 arrays, like

$$\left[\begin{array}{cc}1&2&3\\4&5&6\\7&8&9\end{array}\right] to \left[\begin{array}{cc}1&9&4\\3&8&6\\7&5&2\end{array}\right],$$ or $$\left[\begin{array}{cc}1&1&2\\4&1&6\\2&8&9\end{array}\right] to \left[\begin{array}{cc}1&9&4\\2&8&6\\2&1&1\end{array}\right]. $$ How many such pairs do I require to find the exact mapping. Can I find a relationship between the number of known pairs (n) and the number elements (E)?

P.S. The elements of set can be any value chosen from a finite set, such as {0, 1, ..., 100}.

1

There are 1 best solutions below

4
On BEST ANSWER

In order to uniquely determine a permutation of the locations (entries) of a matrix or list, it is necessary to have enough examples to determine for each "source" location the "target" location to which it is mapped.

In the case that all entries are assigned distinct values, the permutation is uniquely determined by a single example. So the interest lies in using a collection of examples, all of which have repeated values, to uniquely determine the underlying permutation.

Each example involves two matrices or lists, a before "source" and an after "target" version with assigned values to entries. Assume the locations are enumerated by positive integers $k = 1,\ldots,n$ and the example pairs by $r$, so that the permutation may be identified with a permutation $\rho$ on these $n$ locations.

Clearly the mapping of location $k$ is uniquely determined if the sets:

$$J_r(k) = \{ j | \text{target location }j \text{ has same value as source location }k \text{ in example }r \}$$

over all examples intersect in a singleton $\cap_r J_r(k) = \{\rho(k)\}$, and it is sufficient to determine permutation $\rho$ if this is true for all $k$.

Two further questions then appear:

(1) Is this condition also necessary to determine $\rho$ ?

(2) With what efficiency can the mapping $\rho$ be determined from sufficient examples?

The answer to (1) is yes, as may be seen by considering a collection of examples in which some pair of locations is always assigned equal source values; such examples can never resolve between their mapped locations, because the target values are again equal. The answer to (2) depends on assumptions about available examples, but we will point out a connection to the Minimum Test Set Problem toward the end.

However as stated the Question emphasizes finding a relationship between the number of examples pairs needed to find $\rho$ and the counts $n$ of locations and $|E| \gt 1$ of the assigned values $E$ in the locations. It is possible for two example pairs to be related by a permutation on the values $E$ such that both examples give the same information regarding possible source and target locations. So a useful upper bound on the "worst case" number of required examples will entail some restriction that avoids this possible redundancy.

A "best case" in connection with lower bounds on examples can be sharply stated.

Theorem: Given $|E|$ values and $n$ locations, it is possible to choose a set of $\lceil \log_{|E|} n \rceil$ source arrangements so that any permutation $\rho$ on locations is uniquely determined once $\rho$ is applied to get the respective target arrangements. Fewer than this many examples cannot uniquely determine any $\rho$.

Proof: Define $\lceil \log_{|E|} n \rceil$ sources from the radix $|E|$ "digits" expansions of $k-1$ in respective locations $k = 1,\ldots,n$. Taken sequentially these values uniquely label each of the $n$ locations, and therefore $\rho$ is uniquely determined by finding the target locations which exactly match the source labellings.

If fewer examples are used, say $r \lt \log_{|E|} n$, then by counting the possible sequences of $E$ values for each location, we find there are less than $n$ available. Thus by a pigeonhole argument at least two locations would get the same source values in all examples. It follows for any permutation $\rho$ we would be unable to distinguish between the mapped target locations of those. QED

Example: Let's consider the $3\times 3$ matrix case used to illustrate the Question.

If $|E| = 1$ we will make no progress toward determining the permutation, since the only source/target example has all entries assigned equal values.

If $|E| = 2$ then we can determine the permutation with $\lceil \log_2 9 \rceil = 4$ examples (pairs of source/target matrices). One way to see this is to construct the matrix with 4-bit binary expansions for the nine values 0 to 8:

$$ \begin{pmatrix} 0000 & 0001 & 0010 \\ 0011 & 0100 & 0101 \\ 0110 & 0111 & 1000 \end{pmatrix}$$

Splitting this into four binary source matrices based on bit positions, applications of the permutation $\rho$ to these produces four binary target matrices, which when recombined using positional bits reveals the mapped locations of the original nine values 0 to 8.

If $|E| = 3$ then a similar treatment requires only $\lceil \log_3 9 \rceil = 2$ radix 3 positions:

$$ \begin{pmatrix} 00 & 01 & 02 \\ 10 & 11 & 12 \\ 20 & 21 & 22 \end{pmatrix} $$

So split this matrix into two source matrices (having entries 0,1,2), and apply the permutation to both. Recombining them as base 3 values gives the permuted values 0 to 8, as required to determine the permutation.

Until one gets $|E| \ge 9$, more than one example is necessary to deduce the permutation, as per Pigeonhole Principle some value has to be used more than once in an example.

Minimum Test Set Problem

There is a "classic" NP-hard problem which asks to determine a minimum size subset of a collection of sets such that every pair of points of some domain can be distinguished, i.e. by one set from the collection (a test) that contains one point but not the other.

This is related to the current question in that each example gives, for every source value in $E$ used in the source, a set of locations that are possible mappings. We are asking that for each source location, a choice of examples will narrow the possible corresponding target to a single possibility, and thus the requirement amounts to having a "test set" chosen from those "tests" created by equal source values in some example.

As mentioned earlier computing the exact minimum for this is known to NP-hard, but approximations within a factor $1 + O(\ln n)$ are polynomial computable. For details see the 2002 paper Branch-and-Bound Algorithms for the Test Cover Problem.

Determining the Permutation using All Examples

In practice the minimum set of examples may not be as important as the efficiency of determining the permutation mapping (or indeed, showing that not enough information is given by a collection of examples). We present an approach for determining the mapping if there is sufficient information and for detecting when there isn't.

Let $S_i$ be the pairs of locations and values represented by the source of the $i^{th}$ example, $i=1,\dots,r$, and let $T_i$ be the similar sets of pairs in the target. Construct $\mathscr{S}$ by pairing each source location with a sequence $(e_1,\ldots,e_r)$ of the values assigned to that source location in all $r$ examples, and similarly for $\mathscr{T}$ using the values assigned in the target locations.

Now sorting $\mathscr{S}$ on the sequence portions of its pairs will reveal if there exists two or more locations having the same values assigned in all examples, and thus that there is insufficient date to uniquely determine the permutation mapping $\rho$.

But if the sequence portions are distinct, then the $n$ locations all have a unique composite key in this sequence of values which can be used to determine the permutation. In fact sorting $\mathscr{T}$ similarly on sequence portions gives the correspondence with $\mathscr{S}$ since the same $n$ distinct sequences now appear in both orderings. The correspondence between source and target locations is now obvious.

Most discussions of sorting complexity assume a constant length of keys, but as the problem here is specifically about the number of examples used, it is appropriate to treat the length of keys (and thus cost of comparisons) as proportional to $r$ the number of examples. Thus the overall complexity of this approach to determine the permutation is $O(r\;n \ln n)$.