# Binary Search

There is an interesting article at google research that discusses the fact that many implementations of the binary search algorithm are incorrect. The example given is that in Programming Pearls translated into java. Reproducing the original code in the book, together with some explanatory comments:

``````
int binarysearch(datatype t, datatype *x, size_t n)
{
/* Lower and upper limits and middle test value */
int l, u, m;

/* Initialize bounds */
l = 0;
u = n - 1;

/* Are we done yet? */
while (l <= u)
{
/* Halve the range */
m = (l + u) / 2;
if (x[m] < t)
{
/* Move lower limit */
l = m + 1;
}
else if (x[m] == t)
{
/* A match - return its location */
return m;
}
else /* x[m] > t */
{
/* Move upper limit */
u = m - 1;
}
}

/* Failure */
return -1
}
``````

The article notes that this is typically how a binary search is implemented, and that it has a subtle bug. The problematic line is `m = (l + u) / 2`. The question is what happens if the addition overflows? If that happens, then in 2's complement arithmetic, which is what all modern machines use, the resulting division by two will yield an incorrect answer. This causes the algorithm to fail via degenerating into an infinite loop. The rest of the article (including the comments) is a discussion of what the correct fix for this problem is.

Obviously we need to fix the overflow. One given solution is to replace the line with `m = l + (u - l)/2;` Unfortunately this doesn't always work. The reason is quite subtle. In C, the result of signed integer overflow is undefined. This means that the compiler is allowed to assume that it doesn't ever happen. This in turn allows optimizations to be made which wouldn't otherwise be possible. In most code, this is unnoticeable. However, in this case, the compiler is legally allowed to make the transformation back into `m = (l + u)/2;`

Whether it does this or not is a function of how smart your compiler is. The smarter the compiler, the better it is at finding optimizations like this that undo your hard work of trying to get numerically correct answers. By rearranging further, it may be possible to confuse a compiler enough in order to give the correct answer. i.e. if we move the `(u - l)/2` fragment into another function it may help. However, compilers are getting better over time, and inter-function optimization isn't all that hard to implement. Trying to trick the compiler isn't the answer. The correct solution is to tell it exactly what we want so it can then give us that.

Another solution from the article is to use the substitution ` m = ((unsigned)l + (unsigned)u) / 2;`. Since unsigned arithmetic is declared to be not undefined under overflow by the C standard, perhaps this helps? Unfortunately, it doesn't. The statement still gives the wrong answer. The only difference is that an unsigned shift is used instead of an arithmetic shift. i.e. `(UINT_MAX + 1) / 2` will give zero, which is not what we want.

A solution given in the comments below the main article does finally give the right answer. It uses the fact that the addition operation can be split up bitwise. The carry between bits can be obtained by the logical-and of the two summands. The resultant bit can be obtained by the exclusive or. Thus we have `x+y ≡ (x&y)*2 + x^y`, and finally `(x+y)/2 ≡ x&y + (x^y)/2`. If we use this identity, then overflow cannot occur.

Are we done? Unfortunately, the above misses the forest for the trees. We are implementing a binary search algorithm. What does this imply? Firstly, the array we are searching through must fit in memory. Secondly it must make sense for us to be searching using this algorithm. The first implication means that the size of the array is constrained. If we have a 64bit machine, then the maximal address space for user-space (or kernel-space) is 63bits, as the address space is divided between them. Since a program must be running, and it itself takes up room, then less than 63bits of address space must be devoted to the array. (This is ignoring the fact that implementations of the 64bit architecture actually only use 48 address bits currently - we might as well be a little future-proof.) This means that no matter what happens, the overflow error doesn't exist on the 64bit case provided we use a 64bit integer.

The 32bit case is harder. It is possible to have a 3GiB/1GiB or even 4GiB/4GiB user to kernel-space split there, so we need to think a little more. Looking at the code, we know that `datatype` must be something comparable with the < and == operators. Assume we are comparing 4-byte integers or pointers. How many of these will fit in our address space? A maximum of 230 of them. This means that no matter what, the addition won't overflow. We are left with looking at 2-byte short integers, and byte-sized integers (characters).

The argument for short integers is a little more precarious. We can have 231 of those in our address space, and this may cause an overflow if 231+231 is calculated. However, to get this large a number of short integers requires our program implementing the binary search to take up zero space. If our program takes up a grand total of two bytes (or more, obviously), then the overflow cannot happen. So pragmatically, we can ignore the problem here as well.

This leaves the final case, we are doing a binary search on individual bytes. The above address-space arguments fail for this case, there are indeed realistic hardware setups which could allow an overflow condition. However, we still have the second implication above which we haven't used until now. Why are we using a binary search? We are trying to find the location of an element with a given value in a sorted array. Can you imagine doing this on 4GiB of sorted bytes? The whole operation doesn't make sense. The search can easily be replaced by a 256-entry table with the cumulative sum of the number bytes with a given value. The algorithm can then be implemented by a trivial table-lookup.

So what is the correct solution? The easiest solution is to simply replace the type of `u,l,` and `m` with something that won't overflow. If you care about speed, and are using a 32bit platform, then just use `size_t`, and then document that anyone using this on >2GiB of individual characters is an idiot. If you are creating a library routine, then using the addition identity above will work. The compiler may even be smart enough to replace it with an addition followed by a rotate-with-carry. The pragmatist may use 64bit integers in this case, since the code is then easier to read and understand. Finally, if you use a 64bit machine, just use a 64bit `size_t`.

The article has been used by the proponents of higher-level languages to say that C is fundamentally unsound, and that their pet language could never suffer this problem. Unfortunately for them, I think their argument fails . As has been shown, the only actual case where the bug appears, is where the algorithm is so obviously inefficient that it would be replaced with something better. If you are using a low-level language like C, then speed tends to be one of the things you care about otherwise you wouldn't be using it. Thus the C programmer tends to be on the lookout for silliness like this. However, taken from the high-level language point of view, the lack of fixed-sized integers does prevent some classes of bugs. Unfortunately, it does this at a cost of an order of magnitude or more of performance.

The extra speed of languages like C comes at a price. The programmer simply needs to think about the underlying architecture much more often.

Stickypens said...
why can't we just do (l/2 + r/2). won't this solve the overflow problem?
Thank you for such a pellucid explanation. I had to reread this article a few times before it sank in, but the effort was worth it. I really learnt something new. Thank you again.
dextrous said...
@Stickypens What if l and r are both odd ... l/2 + r/2 will give wrong result ...
Ashish said...
@dextrous We can check if both l and r are odds then perform (l/2+r/2)+1 to the final result.
said...