Lockless MPI DemoMPI is a Message Passing Interface standard. It specifies a series of library functions and types that allow one to construct a cross-platform program that uses message-passing for parallel communication. It is designed so that the same program may run on both SMP shared-memory machines, and clusters of machines without shared memory. This flexibility allows programs to be debugged on the desktop, and then launched on the big-iron when done. The core concept of MPI is that nothing is shared. This share-nothing ideal means that there is no global state that needs to be protected by locks or other synchronization primitives. Instead, each MPI "rank" runs with its own memory and no obscure and slow atomic instructions are needed for computations. All communication is handled through the MPI library, which can be especially optimized so that this communication is as fast as possible. The beauty of MPI is that only very few of its functions need to be understood in order to use it. Most of the rest can be thought as being orthogonally composed from a few primitives. The primary ones being MPI_Send() which sends a message, and MPI_Recv() which receives one. Non-blocking versions of these functions are MPI_Isend(), and MPI_Irecv(), which create a handle of type Lockless MPI DemoThe demo version of Lockless MPI is a free download which lets you test out the library in a limited configuration. The main constraint is that a maximum of only two MPI ranks is enabled. However, you may choose to have these two either on the same machine (to test local message-passing), or on seperate machines (to test message-passing over the network). Since only two ranks are allowed, we have disabled nearly all "collective" operations (which typically only make sense with larger numbers of MPI ranks), and communicator-manipulating functions (which also only make sense if you have many ranks). For simplicity, Infiniband is also disabled (it requires the ibverbs library to be installed). However, if you would like to test Infiniband, just contact us. The demo version is however perfect for doing benchmarks. You can test all of the point-to-point communication methods, and compare latency, bandwidth, and cpu usage with other MPI libraries. We hope you'll find this useful for seeing the advantages of our design The key feature of Lockless MPI is that it uses a thread-based layout instead of a process-based layout on each machine. This means that ranks can share address-spaces, reducing message-passing latency, and increasing bandwidth through halving the number of copies. To implement this scheme, we use a pre-processor which converts all global variables in your C program into thread-local ones. It similarly changes all function-static variables into their thread-local equivalents. Since the pre-processor only works with C; C++ and Fortran (and other languages) are currently unsupported. However, if you remove the use of global variables from your program, you may still have luck in getting this ultra-fast MPI to work. Using MPIThe first thing you should do when testing out MPI is to create a "hostfile". This allows you to describe the shape of a cluster to the library by using the
Since MPI is special, we can't use the normal commands for compiling, linking, and running MPI programs. Instead, the MPI specification lists several executables that should be used instead. To compile or link a MPI program, use So to create a simple benchmark, we will use a "ping-pong" algorithm. Messages will be passed back and forth between two ranks. For simplicty, we will use the MPI_Sendrecv() function for this, which simultaneously sends and receives a message. By changing the size of the messages used, we can see how the latency and bandwidth are affected. Such a function may look like:
Note how in the above, all ranks used the same control-flow even though they may not have been part of the ping-pong benchmark. This is considered good programming practise. The MPI spec has many details which encourage this. In fact, the above could be linearized even more, with extra ranks sending and receiving with "MPI_PROC_NULL", which does nothing. Other examples may use "inter-communicators" and "topologies" for this purpose. To use the function, we can call it like so:
The results of the above program are shown on the MPI benchmarks page. Try to reproduce them yourself. Note that you may need to run it a few times to get a stable result. Your computer tends to be running many tasks, and these may steal time, giving spuriously slow results occasionally. |
About Us | Returns Policy | Privacy Policy | Send us Feedback |
Company Info |
Product Index |
Category Index |
Help |
Terms of Use
Copyright © Lockless Inc All Rights Reserved. |
Comments
said...