(Interview Q, Algorithm) Elements in two unsorted array that sum to particular integer

274 Views Asked by At

I'm preparing for technical interviews, and I stumbled upon this Q while googling it. The question is bit too long so I will paraphrase it. There are basically two inputs: 2 n-element arrays X and Y (all integers) and value a. This two arrays are unsorted and the algorithm should return 1 if X[l]+Y[r] = a and 0 if there are no such integers. What is the worst case behaviour?

What I think should be correct is: 1)sort the array in non decreasing order 2) let l=0 and r = arraysize -1

3) while l < r

   if (X[l] + Y[r] ==a) return 1  //edited
   else if(X[l] + Y[r] < a) l++
   else r--

4) if no such elements, return 0

I'm not really sure about big oh(self-taught) but if we use mergesort to sort it, it would be O(nlogn).

Can anyone please clarify this?

5

There are 5 best solutions below

2
On

Yes, your sorts are $O(n \log n)$ and we don't care about factors like $2$. Looking through the sorted arrays is of order $n$

0
On

I think the idea of sorting each array is fine.

First, a minor typo. You have "x[l]-y[r]" where you mean "x[l]+y[r]".

Second, this can be reduced to finding when x[i] == y[j] by replacing each y[i] by a-y[i].

Third, I'm not sure about the adjustment of l and r. I would like to see some assertions that could be used to prove termination and correctness.

Finally, I don't think that the "l < r" is the proper termination condition, because the x[l]+y[r] == a could occur anywhere.

Rather I think the termination tests should occur at the changing of l and r in the forms

"if ( ++l >= n ) return 0"

and

"if ( --r < 0 ) return 0".

0
On

We define the matrix $S$ by $$ s_{ij} = x_i + y_j \quad (i, j \in \{ 1, \dotsc, n \}) $$

The worst case is visiting all indices $i,j$ to test if $s_{ij} = a$. This would need $n^2$ tests.

Having the arrays / vectors $x, y$ sorted by value, e.g. $$ i \le i' \Rightarrow x_i \le x_{i'} $$ would allow us to use the transitivity of the order relations to exclude rectangle shaped areas from testing: $$ a < x_i + y_j \le x_{i'} + y_{j'} \quad (i \le i', j \le j') \\ U_{ij} = \{ (i', j') \mid i \le i', j \le j' \} \\ x_{i'} + y_{j'} \le x_i + y_j < a \quad (i' \le i, j' \le j) \\ L_{ij} = \{ (i', j') \mid i' \le i, j' \le j \} $$

Algorithm

A bisection scheme for $\{1,\dotsc,n\} \times \{1,\dotsc,n\}$ seems to be a good choice, so we start about in the middle of the diagonal $$ i := \lceil n / 2 \rceil \\ j := \lceil n / 2 \rceil $$ and test $s_{ij}$ (which we calculate from $x_i$ and $y_j$ on demand right now, not all in advance):

  • If $s_{ij}=a$ we are done,
  • if $s_{ij} < a$ we exclude $L_{ij}$ and our new search region is the upper half of the diagonal,
  • if $s_{ij} > a$ we exclude $U_{ij}$ and the new search region is the lower half diagonal.

We are either successful or end up in a search region that consists of a single element, which means there is no diagonal element with value $a$.

The interesting information is which regions were excluded during the bisection search:

  • We might have ended up at the borders of the diagonal with $U_{11}$ or $L_{nn}$ which means we can stop the search, every element of $S$ has been excluded, there will be no find.
  • Or we have ended somewhere in between with $L_{qq}$ and $U_{rr}$ and $r = q + 1$. This means two rectangular blocks have been not excluded, both are smaller than $n^2$ in size, on which each we have to perform the described method again.

This gives a recursive algorithm, we only have to add a condition, that there is no bisection scheme needed if we deal with a $1\times 1$ sized region for search. It will terminate either with success or no find at all.

Complexity

Total effort is the two sorts of the vectors, each of $O(n \log n)$ and the effort for the recursive algorithm.

Depending on the data it might worst case only exclude about half of the matrix per recursion level. This would lead to about $k$ levels with about $2^k = n^2$ or $k = (2/\log 2) \log n$ thus $O(\log n)$ levels. Each level has about $O(\log n)$ tests. So the effort seems $O((\log n)^2)$.

So we end up with a $O(n \log n)$ method.

Example

$$ x = (1, 2, 3, 3)^\top \\ y = (2, 4, 5, 6)^\top $$ This gives $$ S = \left( \begin{array}{cccc} 3 & 5 & 6 & 7 \\ 4 & 6 & 7 & 8 \\ 5 & 7 & 8 & 9 \\ 5 & 7 & 8 & 9 \end{array} \right) $$

Search for $a = 2$:

Start at $(2,2)$ $$ \left( \begin{array}{cccccc} 3 & 5 & & 6 & & 7 \\ 4 & 6 & - & 7 & - & 8 \\ & | & & & & \\ 5 & 7 & & 8 & & 9 \\ & | & & & & \\ 5 & 7 & & 8 & &9 \end{array} \right) $$ we can exclude $U_{22}$. Next inspecting $(1,1)$ $$ \left( \begin{array}{ccccccc} 3 & - & 5 & - & 6 & - & 7 \\ | & & & & & & \\ 4 & & 6 & & 7 & & 8 \\ | & & & & & & \\ 5 & & 7 & & 8 & & 9 \\ | & & & & & & \\ 5 & & 7 & & 8 & & 9 \end{array} \right) $$ This test leads to excluding $U_{11}$, which corresponds to the whole matrix $S$. Thus no element with value $a=2$ there. Only two elements of sixteen tested.

Search for $a=7$:

Start at $(2,2)$, $L_{22}$, then $U_{33}$. $$ \begin{pmatrix} 3& &5& &6& &7\\ & &|& & & & \\ 4&-&6& &7& &8\\ & & & & & & \\ 5& &7& &8&-&9\\ & & & &|& & \\ 5& &7& &8& &9 \end{pmatrix} $$ Then recursion for the remaining lower matrix block $$ \begin{pmatrix} 5 & 7 \\ 5 & 7 \end{pmatrix} $$ and the remaining upper matrix block $$ \begin{pmatrix} 6 & 7 \\ 7 & 8 \end{pmatrix} $$ The first matrix would have a find at the second step, the second matrix would descend in another recursion for the remaining blocks $(7)$ and $(7)$ and then, of course, a direct find each.

This needs testing only four elements of sixteen, if the lower remaining block is visited first during recursion, and one recursion step.

Search for $a=4$:

$U_{22}$, then $L_{11}$. $$ \begin{pmatrix} 3&|&5& &6& &7\\ -&+& & & & & \\ 4& &6&-&7&-&8\\ & &|& & & & \\ 5& &7& &8& &9\\ & &|& & & & \\ 5& &7& &8& &9 \end{pmatrix} $$ Recursion for $$ \begin{pmatrix} 4 \\ 5 \\ 5 \end{pmatrix} $$ and $$ (5,6,7) $$ For the first sub problem $U_{21}$ then find. For the second $U_{11}$ no find.

This again needs testing only four elements of sixteen, if the lower remaining block is visited first during recursion, and one recursion step.

5
On

Your algorithm won't work. Consider $L = [0, 1]$, $R = [3, 5]$, $a = 3$. You'll skip over the $0 + 3$ possibility.

In fact, no small change is going to fix that, because you (probably) are making assumptions about monotonicity that aren't true (hard to say since you didn't exactly give your reasoning). Let's say you put $L$ in order, and $R$ in reverse order. The pointwise sum of $L + R$ isn't necessarily increasing or decreasing, it could be fluxuating.

The simple solution is, since you've already spent $n~\log n$ to sort the arrays, just itterate through one and binary search the rest:

sort X
sort Y
foreach m in x
  binary search for n in y such that m + n = a

That's simple $n ~ \log n$. There's probably some trick to make the algorithm $O(n)$ in a contrived sort of way, but those algorithms always depends on some sort of $O(1)$ list access which is quite dishonest outside of the context of state machines, so $n ~ \log n$ is the theoretical lower bound.

0
On

I'll assume that we are talking about randomized algorithms for a RAM machine (which is a good approximation of a modern computer).

Your algorithm is elegant and looks correct to me, but it is not optimal since there is a faster $O(n)$ algoritm. Linear time is optimal since you have to look at all the elements of each list in general.

First, lets suppose that $a = 0$ (we'll remove this assumption later). In this case, the problem reduces to computing the intersection of two sets because $x + y = a = 0$ iff $x = -y$ and we can negate all of the elements of one of the sets before computing the intersection. The intersection can then be performed in $O(n)$ time using various methods based on hashing.

Now, we just need to reduce to the case where $a = 0$. This is accomplished by subtracting $a / 2$ from each element of both sets.