There are a couple of subtle issues that one must deal with when implementing the
More than a few memory allocators implement it this way. Can you spot the problems? The first is fairly obvious. The memory returned from malloc may already be filled with zeroes. This happens when malloc obtains the memory straight from the operating system. Due to security reasons, the memory is pre-cleared by the operating system first. This means that the memset may not be required. This doesn't seem like that big of a deal. However, the most likely case where this happens is when a large chunk of memory is asked for. Clearing it again can be extremely slow as the whole block of memory will need to be paged in. The operating system trick of merging pages filled with zeroes is thwarted, and much more memory may be used. The resulting slow-down can be extreme if the program was going to use the memory for a sparse array.
The second problem is more subtle. The multiplication may overflow. This causes the memory allocator to allocate less than the program asked for. The resulting malloc may succeed, and a pointer may be returned. The program can then overflow the buffer, and a security problem will exist. The solution is of course to securely do the multiplication, and return NULL if the result is too large.
The problem for today is to work out the fastest way to implement the multiplication and overflow detection process. If this were an addition, the code would be rather simple. Add the two numbers, and check to see if the result is less than either of them. If so, then we have overflowed.
This algorithm doesn't work with multiplication. An example that shows this is imagine your machine has a 4bit size_t. Let n=7 and size=9. The true value of the multiplication is 63. However, mod 16, this is 15. 15 is larger than both 7 and 9, however, we have still overflowed.
A slow technique to do the multiplication safely is to break the size_t values into two chunks. For the rest of this page, assume size_t is a 64bit type. We split them into two 32bit chunks, corresponding to the upper and lower 32bits.
The above works. However, it is quite slow, since it has quite a few multiplies. Can we do better? Looking in the "Hackers Delight" book gives a new technique based upon counting the number of bits used in each value.
Is there an evern faster way? Yes, but it is non-portable. We require a 128bit type to do the multiplication correctly. Fortunately, gcc has this. The resulting code is rather simple
Note how the shifts actually correspond to a choice of register, so aren't actually emitted. Also note how there is only a single multiplication this way. The above code is non-portable. However, the 32bit version is in C99, where the
Company Info |
Product Index |
Category Index |
Copyright © Lockless Inc All Rights Reserved.