Optimize your Software
It is a simple step to speed up your software. No source code changes are required. Lockless MPI seemlessly replaces your system MPI implementation, and you reap the performance benefits.
Lockless MPI Released
Version 1.2 of the Lockless MPI has just been released. It is optimized for modern 64bit multicore systems, and supports programs running on Linux. There are bindings for C, C++ and FORTRAN. It supports version 1.3 of the MPI spec, with a few small parts of version 2.0
Lockless MPI uses lock-free techniques to minimize latency and memory contention. It uses meta-processes having a shared address space, similar to threads. This means that less expensive memory-to-memory copies are required, and more than doubles performance for large messages.
Superior MPI Bandwidth
Lockess MPI has superiour bandwidth compared to other MPI implementations.
Lockless MPI supports Infiniband and ethernet-based clusters. It uses an IO thread to allow overlap of communication and computation.
Lower CPU Usage
Lockless MPI will send waiting ranks to sleep rather than wastefully use cpu resources. This allows more ranks per cpu to be efficiently used, and simple problem scaling through the use of extra ranks.
The Lockless memory allocator can take about half the time as other allocators to complete the benchmarks. The Lockless memory allocator does not suffer from slowdowns at larger allocation sizes.
The Lockless Memory Allocator is downloadable under the GPL 3.0 License. You can thus use the allocator in other open-source programs. However, if you wish to use it in closed-source proprietory software, Contact us about other options.
Test it Yourself
Use the open-source nature of the Lockless Memory Allocator to test it against the competition. Look for low memory usage, and low synchronization overheads. Or more simply, look for higher performance...
The Lockless memory allocator uses lock-free techniques to minimize latency and memory contention. This provides optimal scalability as the number of threads in your application increases. Per-thread data is used to reduce bus communication overhead. This results in thread-local allocations and frees not requiring any synchronization overhead in most cases.
Multiple Systems Supported
The Lockless memory allocator supports 64bit and 32bit Linux and 64bit Windows. On Linux, applications do not need to be recompiled to use the allocator due to the possibility of using LD_PRELOAD. This means that you can make other proprietory software run faster since a recompile is not required.