Height of all skyscraper

611 Views Asked by At

There is a bunch of skyscapers, each have a height, which is a positive integer. You are given at the start the total sum of their height. Now everyday you can make one measurement, which will tell you how many skyscapers there are which have height at least $k$, for some $k$ of your choice. You are allowed to choose that $k$ whatever knowledge you have obtained so far, such that measurement results from previous days. You can stop once you know with perfect certainty the height of each skyscrapers. (also, you are not given the number of skyscrapers at the start, but asymptotically, that does not matter since you can always pick $k=1$ on the first day)

So my question is, what is the algorithm for choosing the measurement each day such that the number of days in take in worst case is asymptotically optimal, and what is that number of days asymptotically.

Example:

Let's say that the final result would be $3,1$ (there are $2$ skyscrapers, and the height of the skyscapers are $3$ and $1$).

First you know the total height is $4$. You pick $k=1$ on the first day, you now know there is $2$ skyscrapers of height at least $1$ (which means there are $2$ skyscapers). The next day you pick $k=2$, and you learn there is only $1$ skyscapers of height at least $2$. So exactly one of them have height $1$, and the other one must have the remaining height, which is $3$. Hence you are done after $2$ days.

EDIT: I made some edits to the question to clarify certain points.

3

There are 3 best solutions below

1
On

Update: Here is a modified version of the first algorithm that is $\mathcal{O}(S^{2/3})$ (in fact probably faster), partially inspired by the comment by Q the Platypus:

Test $\lceil S/2\rceil$, $\lceil S/4\rceil$, $\lceil S/8\rceil$, $\dotsc$,$1$, which takes $\mathcal{O}(\log{}S)$ tests.

Then, for each interval $[\lceil S/2^i\rceil, \lceil S/2^{i+1}\rceil]$, binary search it the number of times there is a number in there. When doing later binary searches within the same interval, use information from earlier binary searches in that interval.

To find the time complexity, we first find the number of steps it takes to search for $n$ numbers in the interval $[\lceil S/2^i\rceil, \lceil S/2^{i+1}\rceil]$.

There are $\lfloor S/2^{i+1}\rfloor$ numbers in that interval, so the first binary search takes $\lfloor\log_2\lfloor S/2^{i+1}\rfloor\rfloor$ steps. But in doing so, we have essentially divided up those $\lfloor S/2^{i+1}\rfloor$ numbers into intervals of size $\lfloor S/2^{i+2}\rfloor$, $\lfloor S/2^{i+3}\rfloor, \dotsc, 1$. So the next one will take one less step, the next two will take one more less step, the next 4 will take one more less step, etc.

So for $n$ numbers, it will take at most $\lfloor\log_2\lfloor S/2^{i+1}\rfloor\rfloor+\lfloor\log_2\lfloor S/2^{i+2}\rfloor\rfloor+\lfloor\log_2\lfloor S/2^{i+3}\rfloor\rfloor+\lfloor\log_2\lfloor S/2^{i+3}\rfloor\rfloor+\dotsb$ steps where there are $n$ terms being added.

To find an upper bound on the time complexity, note that the expression for the number of steps based on the number of "most steps" derived one step above can sometimes increase when splitting one number from the interval $[\lceil S/2^i\rceil, \lceil S/2^{i+1}\rceil]$ into two numbers from the interval $[\lceil S/2^{i+1}\rceil, \lceil S/2^{i+2}\rceil]$. So, what partitions of $S$ can have a number split to increase or at least not decrease this value (The reason we consider this will become clear after this computation)?

When doing such a split, the number of numbers to search for increases by 1, so the number of searches increases by 1. But the number of steps in each search decreases by 1, assuming no other numbers are in the interval. The number of steps it takes for the last number in the original interval is at most $\lfloor\log_2\lfloor S/2^{i+1}\rfloor\rfloor$, so the maximum possible value of the number of steps increases by $-1+\lfloor\log_2\lfloor S/2^{i+1}\rfloor\rfloor-1=\lfloor\lfloor\log_2\lfloor S/2^{i+1}\rfloor\rfloor-2\geq\lfloor\log_2 (S/2^{i+1})\rfloor-3=\lfloor\log_2 S\rfloor-i-4$. (Note that I am bounding above the sum of the expressions of the form derived above, which is an upper bound on the number of steps, so this step is valid.)

To account for numbers in the interval already, note that at most $2^{i+1}$ numbers are in the interval $[\lceil S/2^{i+1}\rceil, \lceil S/2^{i+2}\rceil]$, or else the sum would be more than $S$. So the numbers already in the interval decrease the number of steps in the search for the first new number by at most $\lceil \log_2 2^{i+1}\rceil+1=i+2$, and the second by at most $\lceil \log_2 (2^{i+1}+1)\rceil+1=i+3$, in total $2i+5$.

In total, the change (which is an increase if positive) in the number of steps is at least $\lfloor\log_2 S\rfloor-i-4-(2i+5)=\lfloor\log_2 S\rfloor-3i-9$, which is positive if $\lfloor\log_2 S\rfloor\geq 3i+9$, which occurs if $i\leq \frac{\lfloor\log_2 S\rfloor}{3}-3$.

This is useful because this means that, as long as the partition has numbers in intervals with $i\leq\frac{\lfloor\log_2 S\rfloor}{3}-3$, we can increase the upper bound on the number of steps by lowering the numbers, so to calculate an upper bound of this upper bound, it is enough to consider partitions where all numbers are in intervals with $i>\frac{\lfloor\log_2 S\rfloor}{3}-3$. The numbers in these intervals are at most $\lceil S/2^{\frac{\lfloor\log_2 S\rfloor}{3}-3}\rceil\leq 1+S/2^{\frac{\lfloor\log_2 S\rfloor}{3}-3}\leq1+S/2^{\frac{\log_2 S}{3}-\frac{10}{3}}=1+2^{10/3}S/2^{\frac{\log_2 S}{3}}=1+2^{10/3}S^{2/3}$.

Note that in computing the upper bound on the number of steps in each interval, we never add a term that involves checking a number twice, so the number of steps in such an interval is bounded above by the number of numbers in that interval (there may be some off by one errors here, but there's no way it's more than 1 per number in the interval, so it will not affect the order of the time complexity). So, this is bounded above by the number of steps it takes to try every number in the interval. So, because there are at most $1+2^{10/3}S^{2/3}$ numbers to test, the time complexity is at most $\mathcal{O}(1+2^{10/3}S^{2/3})=\mathcal{O}(S^{2/3})$. There might be an arithmetic mistake here somewhere, but the general idea is correct.

This argument can be improved by saying that more partitions have potential to be split, for example by seeing how many more intervals we can ignore if we see which intervals increases the upper bound by splitting a number into two numbers in a lower interval unless there is only 1 number in that interval, and this approach may lower the exponent.


I will propose two algorithms:

Let $S$ be the sum of the heights of the skyscrapers, and let $n$ be the number of skyscrapers.

1) Here is an algorithm that I cannot figure out the time complexity of:

Binary search for the height of the highest skyscraper, taking $\mathcal{O}(\log S)$. Then subtract this height to get the sum of the rest of the skyscrapers, and repeat the algorithm with them. To speed things up, in each iteration, only binary search through heights less than the previous highest height. Note that this does not use $n$ at all.

2) This algorithm is $\mathcal{O}(n\log \frac{S}{n}+n-\log S)$:

Binary search heights from 1 to $S/n$ to find the lowest skyscraper (there must be one with height at most $S/n$ because the average of the heights is $S/n$). Subtract this height, and repeat until 1 skyscraper left, which we don't have to do binary search for.

To compute the time complexity, we have

$$\log \frac{S}{n}+\log \frac{S}{n-1}+\log \frac{S}{n-2}+\cdots+\log\frac{S}{2}=\log\frac{S^{n-1}}{n!}\approx\log\left(\left(\frac{Se}{n}\right)^n\bigg/S\right)=\mathcal{O}(n\log \frac{S}{n}+n-\log S).$$

(I used Stirling's approximation for the $\approx$.)

Note: If $n=\mathcal{O}(S)$, the time complexity becomes $\mathcal{O}(S)$, which can of course be achieved by just checking all heights from 1 to $S$.

2
On

So my question is, what is the algorithm for choosing the measurement each day such that the number of days in take in worst case is asymptotically optimal.

This is actually trivial (it is a case of "everything is finite so just check everything"), and probably not the question you intend to ask. Let $N_k$ be the number of skyscrapers of height at least $k$.

Just use a minimax tree. On odd turns $2t-1$ you choose $k_t$. On even turns $2t$, the opponent chooses the response $N_{k_t}$, the response constrained only by having to be consistent with all known information (there has to be at least 1 multiset of heights that satisfies this all previously answered $N_k$, and satisfies the total height requirement).

You play the minimax to minimize the depth until only 1 possible multiset is left. The opponent plays the minimax to maximize the depth. This is an algorithm that chooses the minimum number of queries to $N_k$. It isn't just asymptotically optimal worst case number of guesses, it is absolutely optimum.

Perhaps the question you meant to ask is: "what is an asymptotically optimal algorithm for minimizing the worst case number of guesses" ?

4
On

I believe $\lfloor n/2 \rfloor$ ($n$ is the total height) is a necessary and sufficient number of days to find the skyscraper heights.

In $\lfloor n/2 \rfloor$ days, you can find all the heights.

The algorithm is: Query $k$ for $k=1,\ldots,\lfloor n/2 \rfloor$. Let $n_1, \ldots, n_{\lfloor n/2 \rfloor}$ denote the resulting measurements. The number $n_1$ is the number of skyscrapers. The difference $n_{i+1}-n_i$ is the number of skyscrapers of height $i$, so we have obtained the number of skyscrapers of heights $1, 2, \ldots, \lfloor n/2 \rfloor-1$.

  • If $n_{\lfloor n/2 \rfloor}=0$, then there are no taller skyscrapers, so we are done.
  • If $n_{\lfloor n/2 \rfloor}=1$, then there is only one remaining skyscraper, and its height is just the unaccounted-for height.
  • If $n_{\lfloor n/2 \rfloor}=2$, then there must be two skyscrapers remaining, because three or more would exceed our height budget. The two skyscrapers will have heights like $\lfloor n/2 \rfloor$ or $\lfloor n/2 \rfloor+1$, depending on whether the total height is odd or even, and whether the total number of skyscrapers is 2 or 3. I'll ignore those annoying details.

For any algorithm, there exists at least two sets of skyscraper heights that are indistinguishable before $\lfloor n/2 \rfloor$ days.

I couldn't prove this formally, but here's a start.

First, notice that the thing we're looking for --- the set of skyscraper heights --- is an integer partition of $n$: it's a set of positive integers that add up to $n$.

For example, suppose $n=6$. Consider the following table, where each row is an integer partition of $6$ and each column is a query $k$ we might perform:

enter image description here

The $i,j$ entry is the measurement you'd get if the true skyscraper heights were described by partition $i$, and you asked how many skyscrapers are at least $j$ tall.

Taking a set of columns from this table corresponds to taking a set of measurements of the skyscrapers.

Therefore we want to take the smallest number of columns (measurements) for which every row of the resulting sub-table is unique. Armed with those measurements, we can see which row in the sub-table has those measurements, and that row corresponds to the skyscraper heights.

I wrote a Mathematica program to brute-force search all subsets of columns to find the smallest subset whose rows are unique, for $n=2,\ldots,12$, and I found that the smallest subsets of columns corresponded to $k=1,\ldots,\lfloor n/2 \rfloor$. I'll post the code if you like.