## Classifying Floating Point NumbersIEEE Floating point numbers consist of three components. The first part is the "significand" which contains an integer which represents the fractional part of the number. Next, follows the "exponent" which is an offseted integer describing what power of two the significand needs to be multiplied by. Finally, the most significant bit contains the sign of the number. The C programming language on x86_64 systems exposes four different floating point types. Three of these are standard IEEE types of 32, 64 and 128 bits in size. The remaining type is an Intel specific 80 bit floating point number. In addition to the difference in size, the Intel numbers include the leading one in the significand, whereas the IEEE floating point numbers only include the fractional part leaving the leading one implicit. The different floating point types contain different precisions for the exponents and significands. They are also passed in different registers.
Most of the above floating point types are well known. However, the IEEE floating point numbers can represent more than just normal numbers. Special bit patterns are used for ±∞, ±0, subnormal numbers, and signaling and non-signaling NaN's. If the exponent has all bits set, or all bits clear, then the number isn't "normal". The two infinities have all exponent bits set, together with all significand bits clear. If a bit is set in the significand, then the result is a NaN. If the highest (or second highest on 80 bit floating point) significand bit is set, then the NaN is signaling, otherwise it is a "quiet" NaN. Zero is represented by all bits clear, and negative zero has just the sign bit set. If the exponent has all bits clear, and there are still bits set in the significand, then the result is a subnormal number. These numbers do not have the implicit leading one, and allow gradual underflow. Many algorithms depend on testing floating point numbers to see which type they are classified as. For example, a data stream may represent missing points as NaN's, which will then need to be tested for later on. So what is the fastest way for testing for a NaN? ## isnan()The most obvious function to use is the
As can be seen, the larger the type, the slower the operation is. The Can we do better? It turns out that
The fact that
The optimized functions are significantly faster:
This is quite a bit faster for ## isfinite()The
Where the above assembly functions correspond to the This code is about twice as fast as the functions within glibc, and again over an order of magnitude faster for the
## signbit()We can use a similar trick to extract the sign bit of a floating point type. However, it is faster to simply shift all of the non-required bits away rather than use a mask. The resulting optimized asm routines are only three instructions long:
Since the task is so simple, the glibc library code is the same speed for
Again, the ## isnormal()The
The
As can be seen by the timing results, the optimized assembly versions are about twice as fast as the glibc library, and even faster for ## isinf()The The big question is which is better: SSE code, or 64bit code? SSE instructions work on a large amount of data at a time, but tend to be relatively slow and inflexible. They are also rather non-orthogonal, and the exact instruction you might like to use may not exist. This makes programming using SSE instructions difficult. Conversely, the 64bit instructions are very flexible, but only work with a small amount of data at a time. It turns out that the fastest code uses both instruction sets simultaneously, extracting the best of both worlds.
How the above routines work is non-obvious. The basic algorithm for the The The
Again, the resulting optimized assembly routines are much faster than the glibc versions, being at least twice as fast. ## fpclassify()The above functions are defined in terms of the
The algorithm we will use is to firstly check the expoenent. If it is all-ones, branch to some code that checks the significand. If the significand is all-zeros, return 1, otherwise return zero. If the exponent isn't all-ones, check to see if it is all-zeros. If not, then we have a normal number, and return 4. Finally, we need to return 2 or 3 depending on whether or not the significand is clear. Unfortunately, this algorithm requires a fair amount of jumping, so is quite slow. However, the last steps can be done branchlessly via various
The above functions have execution times that depend on the the type of numbers tested. This makes constructing a benchmark to test them harder. We will thus test these functions in two ways. The first is to simply use a normal number over and over again. This corresponds to use where most numbers are normal. The other case will test all possibilities by enumerating the values within an array repeatedly.
As can be seen, the mixed input results aren't quite as much of an improvement. However, the normal-only results are at least twice as the glibc. This is due to the branchy nature of the algorithm. ## SummaryUsing assembly code we can obtain about a 2× speedup for the bit-manipulation algorithms used to classify floating point types, over the C standard library. By using SSE instructions in non-standard ways, it is possible construct 128bit comparisons. By using SSE and normal 64 bit code simultaneously it is possible to construct branch-free versions of some quite complex algorithms. The above contains quite a few assembly tricks, and it may be worthwhile investigating the source in more detail. For example, note how the Obviously spending quite so much effort optimizing normal code isn't worth it. However, the floating point classification algorithms are simple test cases that allow one to explore what possibilities the extremes of optimization allow. By spending similar effort on your inner loops in performance-critical code, similar speed improvements may be possible. The most important thing is to think laterally, you don't have to use instructions the way they are normally used... |

About Us | Returns Policy | Privacy Policy | Send us Feedback |

Company Info |
Product Index |
Category Index |
Help |
Terms of Use
Copyright © Lockless Inc All Rights Reserved. |

## Comments

said...