Intersection of integer sets

749 Views Asked by At

This is probably a trivial question for mathematicians but I am not seeing how to approach the following problem:

Imagine two sets defined by:

set1 = (x: x<stop1 and x = start1, ..., start1+n*step1, ...; n element of set of integers)
set2 = (x: x<stop2 and x = start2, ..., start2+m*step2, ...; m element of set of integers)

Where:

  • all variables are integers
  • step != 0

Example of what I mean by this:

for start = 3, stop = 20, step = 5:
set =  [3, 8, 13, 18]

I am looking for:

  • whether the two sets intersect
  • what start, stop, step are for the intersection set
2

There are 2 best solutions below

3
On

Mathematical Model:

You could model the sets $S_1$ and $S_2$ like this \begin{align} I_1 &= [a_1, b_1) \quad (a_1, b_1 \in \mathbb{N})\\ I_2 &= [a_2, b_2) \quad (a_2, b_2 \in \mathbb{N}) \\ Z_1 &= \{ x = a_1 + n \Delta x_1 \mid n \in \mathbb{Z} \} \quad (\Delta x_1 \in \mathbb{N}) \\ Z_2 &= \{ x = a_2 + m \Delta x_2 \mid m \in \mathbb{Z} \} \quad (\Delta x_2 \in \mathbb{N}) \\ S_1 &= Z_1 \cap I_1 \subseteq I_1 \\ S_2 &= Z_2 \cap I_2 \subseteq I_2 \end{align} Then for an element of the intersection $S_1 \cap S_2$ we have the condition $$ a_1 + n \Delta x_1 = a_2 + m \Delta x_2 \iff \\ (\Delta x_1) n - (\Delta x_2) m = a_2 - a_1 \quad (*) $$ This is a linear Diophantine equation $$ a x + b y = c \quad (**) $$ plus the limiting condition $$ a_1 + n \Delta x_1 \in I_1 \cap I_2 $$ A linear Diophantine equation $(**)$ has either no or infinite many solutions. It has solutions if $\gcd(a,b) \mid c$.

Applying the Solution of the Diophantine Equation:

Translating this to the case of equation $(*)$ we have the criterion $$ \gcd(\Delta x_1, \Delta x_2) \mid (a_2 - a_1) \quad (\#) $$ In the case of solutions, we can find a particular solution by using the extended Euclidean algorithm to find integers $s$, $t$ with the property $$ s \Delta x_1 + t \Delta x_2 = \gcd(\Delta x_1, \Delta x_2) \quad (\#\#) $$ Then $$ (n_0, m_0) = \left( \frac{a_2-a_1}{\gcd(\Delta x_1, \Delta x_2)} s, -\frac{a_2-a_1}{\gcd(\Delta x_1, \Delta x_2)} t \right) $$ is a solution of equation $(*)$. Then all solutions are $$ (n, m) = \left( n_0 + k \frac{\Delta x_2}{\gcd(\Delta x_1, \Delta x_2)}, m_0 + k \frac{\Delta x_1}{\gcd(\Delta x_1, \Delta x_2)}, \right) \quad (k \in \mathbb{Z}) $$

Determining the Unbounded Intersection:

So we can use equation $(\#)$ to find out, if the intersection can have solutions. If it can, then we use the $s$ obtained by $(\#\#)$ to calculate $n_0$ and use the first components of the above equation to calculate all $n$ and all elements $$ x = a_1 + n \Delta x_1 = a_1 + \left( \frac{a_2-a_1}{\gcd(\Delta x_1, \Delta x_2)} s + k \frac{\Delta x_2}{\gcd(\Delta x_1, \Delta x_2)} \right) \Delta x_1 \quad (k \in \mathbb{Z}) \quad (+) $$ of the unbounded intersection $Z_1 \cap Z_2$.

These then have to be tested against the boundaries of $I_1$ and $I_2$ to determine $S_1 \cap S_2$. $$ a_1 \le x < b_1 \wedge a_2 \le x < b_2 $$ So it might a good idea to look even before $(\#)$ at $I_1 \cap I_2$ first, where we need $$ a_1 < b_2 \wedge a_2 < b_1 \quad (++) $$ for a non-empty intersection $I_1\cap I_2$.

Start, Stop and Step of the Intersection:

From the above we take that the step of the intersection set is $$ \frac{\Delta x_1 \Delta x_2}{\gcd(\Delta x_1, \Delta x_2)} \quad (\$) $$ Start is found via the smallest feasible $k$, let us call it $k_1$ such that $$ \left( a_i - a_1 - \frac{a_2-a_1}{\gcd(\Delta x_1, \Delta x_2)} s \Delta x_1 \right)\frac{\gcd(\Delta x_1, \Delta x_2)}{\Delta x_1\Delta x_2} \le k $$ thus $$ k_1 = \max_{i \in \{1,2\}} \left\lceil \left( a_i - a_1 - \frac{a_2-a_1}{\gcd(\Delta x_1, \Delta x_2)} s \Delta x_1 \right)\frac{\gcd(\Delta x_1, \Delta x_2)}{\Delta x_1\Delta x_2} \right\rceil $$

The element $x$ just before stop is found via the largest feasible $k$, let us call it $k_2$ such that $$ k < \left( b_i - a_1 - \frac{a_2-a_1}{\gcd(\Delta x_1, \Delta x_2)} s \Delta x_1 \right)\frac{\gcd(\Delta x_1, \Delta x_2)}{\Delta x_1\Delta x_2} $$ thus $$ k_2 = \min_{i \in \{1,2\}} \left\lfloor \left( b_i - a_1 - \frac{a_2-a_1}{\gcd(\Delta x_1, \Delta x_2)} s \Delta x_1 \right)\frac{\gcd(\Delta x_1, \Delta x_2)}{\Delta x_1\Delta x_2} \right\rfloor $$ or one less if the argument of floor was already an integer.

Plugging the found $k_1$ and $k_2$ into equation $(+)$ then gives start and stop values of the intersection.

Summary: We gave two criteria, equations $(\#)$ and $(++)$ to determine if there is a non-empty intersection at all.

We further listed the solution process which allows to determine all solutions enumerated by an integer parameter $k$ and gave conditions how to find the parameters $k_1$ for the smallest and $k_2$ for the largest feasible solution, which can be used via equation $(+)$ to get start and stop values.

We also listed the step value of the intersection, see equation $(\$)$.

The method looks complicated, but has the big advantage of being fast, the sizes of the sets do not really matter, so I would consider it $O(1)$.

Implementation:

I implemented a sample program in Ruby, to find out if this works as planned. You can find the source code here.

I am happy to not have to correct the above formulas.

The biggest bug I was running into during implementation was not using floating point arithmetics for the calculation of $k_1$ and $k_2$. The rest only needs integer arithmetics.

Testing went against a brute force implementation of the set intersection.

Over one million random test cases passed, so I think it mostly works. See an excerpt from a log here.

The output should help to understand this approach.

0
On

If I understand correctly you are considering the sets

$$A= \{x| x= a+n*s_A, \space n\in\mathbb{Z}\}$$ and $$B= \{x| x= b+n*s_B, \space m\in\mathbb{Z}\}$$

with the given boundaries (in other words intersected with $[start, stop]$).

To solve $A\cap B$, you need to solve the diophantine equation

$$s_A*n - s_B*m = b-a$$

See for example here how this can be done: Linear Diophantine Equation Then take every $a+n*s_A$, for the $n$'s that are in a solution of the diophantine equation. Lastly intersect with the interval. Of course the way to actually do this is to start with the first non-negative $n$ and go on until $a+n*s_A$ becomes larger than the smaller of the boundaries.