Given a tape containing 1,050,000 twenty-bit integers, how can you find one that appears at least twice? (Explain, first, why this must happen.) You may assume: a. two tape drives b. one tape drive.
This problem is somehow similar to the previous one.
a. [Solution in the book, page 165] Binary search finds an element that occurs at least twice by recursively searching the subinterval that contains more than half of the integers. My original solution did not guarantee that the number of integers is halved in each iteration, so the worst-case run time of its log2 N passes was proportional to N log N. Jim Saxe of Carnegie-Mellon University reduced that to linear time by observing that the search can avoid carrying too many duplicates. When his search knows that a duplicate must be in a current range of M integers, it will only store M + I integers on its current work tape; if more integers would have gone on the tape, his program merely discards them. Although his method frequently ignores input variables, its strategy is conservative enough to ensure that it finds at least one duplicate.
b. We can do a traversal of the tape at most 4 times, first checking to see how many times does each 5-bit binary word appear in the last five bits of each word. At least one of them (call it “edcba”) must have at least 1050000/32 > 2^15 appearances (this happens due to Dirichlet’s/Pigeonhole Principle) Scan the tape a second time, looking only at the words ending in “edcba”. This time, count for how many words does each 5-bit word appears as a “9-through-5” position substring. Obviously, also from Pigeonhole Principle, there must exist one 5-bit word “jihgf” having its counter strictly larger than 2^15 / 32 = 2^10. (Obviously, then, the number of words ending in “jihgfedcba” on the tape is the actual value of this counter.) Scan the tape two more times, looking at the other 5-long groups of bits of each of the “remaining” words, namely bits “14-through-10”, and “19-through-15”.(Note that we scan the tape completely every time, but we only check fewer words for substrings).