You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Solutions to Chapter 11 | System Design and Memory Limits<br />
Follow Up: What if we have only 10 MB memory?<br />
It’s possible to find a missing integer with just two passes of <strong>the</strong> data set We can divide up<br />
<strong>the</strong> integers into blocks of some size (we’ll discuss how to decide on a size later) Let’s just assume<br />
that we divide up <strong>the</strong> integers into blocks of 1000 So, block 0 represents <strong>the</strong> numbers<br />
0 through 999, block 1 represents blocks 1000 - 1999, etc Since <strong>the</strong> range of ints is finite, we<br />
know that <strong>the</strong> number of blocks needed is finite<br />
In <strong>the</strong> first pass, we count how many ints are in each block That is, if we see 552, we know<br />
that that is in block 0, we increment counter[0] If we see 1425, we know that that is in block<br />
1, so we increment counter[1]<br />
At <strong>the</strong> end of <strong>the</strong> first pass, we’ll be able to quickly spot a block that is missing a number If<br />
our block size is 1000, <strong>the</strong>n any block which has fewer than 1000 numbers must be missing a<br />
number Pick any one of those blocks<br />
In <strong>the</strong> second pass, we’ll actually look for which number is missing We can do this by creating<br />
a simple bit vector of size 1000 We iterate through <strong>the</strong> file, and for each number that<br />
should be in our block, we set <strong>the</strong> appropriate bit in <strong>the</strong> bit vector By <strong>the</strong> end, we’ll know<br />
which number (or numbers) is missing<br />
Now we just have to decide what <strong>the</strong> block size is<br />
A quick answer is 2^20 values per block We will need an array with 2^12 block counters and<br />
a bit vector in 2^17 bytes Both of <strong>the</strong>se can comfortably fit in 10*2^20 bytes<br />
What’s <strong>the</strong> smallest footprint? When <strong>the</strong> array of block counters occupies <strong>the</strong> same memory<br />
as <strong>the</strong> bit vector Let N = 2^32<br />
2 0 3<br />
counters (bytes): blocks * 4<br />
bit vector (bytes): (N / blocks) / 8<br />
blocks * 4 = (N / blocks) / 8<br />
blocks^2 = N / 32<br />
blocks = sqrt(N/2)/4<br />
It’s possible to find a missing integer with just under 65KB (or, more exactly, sqrt(2)*2^15<br />
bytes)<br />
1 int bitsize = 1048576; // 2^20 bits (2^17 bytes)<br />
2 int blockNum = 4096; // 2^12<br />
3 byte[] bitfield = new byte[bitsize/8];<br />
4 int[] blocks = new int[blockNum];<br />
5<br />
6 void findOpenNumber() throws FileNotFoundException {<br />
7 int starting = -1;<br />
8 Scanner in = new Scanner (new FileReader (“input_file_q11_4.txt”));<br />
<strong>Cracking</strong> <strong>the</strong> <strong>Coding</strong> <strong>Interview</strong> | Concepts and Algorithms