The task today is to construct a data structure that is more memory efficient than a standard doubly-linked list, yet having the same computational complexity for all operations. This seems impossible, a doubly linked list is constructed of a next and previous pointer, which point to the nodes before and after the current one. Both of these pointers are required if we want to scan the list forward or backward. They also allow insertion and deletion from any location in the list in O(1) time.
If we remove one of these pointers, we are left with a singly-linked list. Such a list does not allow deletion without knowledge of the location of a previous node. It also obviously doesn't allow scanning in the reverse direction of the links. Making the list circular fixes these problems. We can scan around the loop, and find the previous node to the one we have. Unfortunately, this is slow, taking a time proportional to the length of the list. Thus deletion and the previous operation are O(n), where there are n links in the list.
The first thing to note is that if we limit the size of the list, then n is bounded. If so, then the computational complexity becomes bounded - i.e. it becomes O(1) again for the deletion and previous operations. However, this doesn't help us much. We would like to store an arbitrary amount of data in our data structure, and limiting its size means we could use the array index trick: The index for an array doesn't have to have the same bit-depth as a pointer. If we use arrays with a maximum of 2^16 elements, a
Another well known trick is to xor the next and previous pointers together. This means that if we have pointers to two adjacent nodes in the list, we can then iterate forwards or backwards by using xor operations. Unfortunately, in many cases you may only have a pointer to one node, not two, and may want to insert or delete a node. Thus this trick is only applicable in some cases. We would like something that only requires a pointer to a single node in order to delete that node, just like the standard doubly-linked list.
A way to asymptotically approach a single pointer per list node is to use both single and double links in the list. We can have a doubly-linked list where the previous pointers point to the previous double link. The next pointers can then point to a short singly linked list that connects to the next double link. Provided the length of the singly-linked sections are bounded, then all operations are O(1), just like in the doubly-linked list. We just need to follow the chain of pointers around the loop defined by adjacent double links to get to our previous node. Since this chain is bounded in length, we have O(1) operations to do. The trade off here is memory verses time. The further the double links are appart on average, the longer operations take, but the less memory is used per node.
Note that obviously we will require some way of determining whether a node has a previous pointer. By noticing that pointers are aligned in memory, the next and previous pointers have low-order bits that must be zero due to this alignment. We can set one of these "unused" bits to be a flag to denote a double link. By masking off the bit, the value of the pointer is unaffected by this hack.
The above method is difficult to use. It requires nodes to be different sizes, thus the code that uses such a list must somehow "know" this is the case, and have the list pointers last in a structure. (Thus having more than one list in a given structure is complex.) The code needs to worry about allocating a node of the correct size, and simple operations like moving a node from one list to another become problematic as memory may need to be allocated and/or freed. Another method, where double links are nodes without data is also possible. However, it also has similar problems, and is not quite so memory efficient.
Fortunately, it is possible to improve the above technique into something that always takes the same amount of memory: the size of a single pointer per node. In other words it is possible to have something that behaves exactly like a doubly-linked list, but compressed into the size of the singly-linked list. To do this, we notice that we have more than just one "spare" bit per pointer of use to us. These extra bits can be used to store the previous pointer of the double link. Since obviously the previous pointer is much larger than the number of spare bits in a single node, we will need to distribute these bits over a significant fraction of the nodes in the single-linked chain between the double links. I call this datastructure a "Sparse List", as the previous pointers are sparsely distributed amoungst the next pointers.
If we assume that the memory allocator is SSE2 compliant, and returns objects that are 16-byte aligned, then we have four spare bits per pointer. If we use one of these bits to determine where the stored pointer starts, we have three bits per pointer remaining. We thus require (64 - 4)/3=20 single links to store a previous pointer on a 64bit machine. On a 32bit machine, we require 10 single links since pointers are half the size. Thus, we can bound the length of the single-linked sections to 20 nodes from below, and 20*2-1 = 39 from above on 64bit machines. (Once we get to 40 nodes, we can construct 2 previous pointers instead of one.) We set the limit to 39 instead of something higher to limit the length of the single-linked chain. The shorter the chain, the faster the delete and previous operations are.
The resulting algorithm to implement this data is quite complicated. One needs to make sure that the bits used to recreate the previous pointers are not affected by insertion or deletion operations. Also note that the encoding used for storing the bits is not optimal. It should be possible to construct a frame-recovering code that allows the usage of the pointer-start bit to store extra information. If this were done, it would allow the shrinking of the minimal number of nodes in a 64bit singly-linked chain to 16 nodes instead of 20.
An implementation of the Sparse List data structure is at the downloads page. This implementation is licensed under the GPL, and also contains a simple test suite.
The big question is is this data structure used anywhere in the Lockless Memory Allocator? The answer is no. The asymptotic version was used for a while to handle 8-byte fragments. However, the allocator now has a faster way of handling them. Unfortunately, even though the sparse lists are O(1), the constant factors are quite large, and the resulting algorithms are quite slow.
Company Info |
Product Index |
Category Index |
Copyright © Lockless Inc All Rights Reserved.