Flexible Lists in C++

The Linux kernel uses an interesting data structure to implement linked lists. Instead of open-coding pointers, it uses a the list_head structure together with macros, to create a simple yet flexible interface. The list_head struct is defined as

The difference is that the data members are not contained within the list_head structure. To use it, one must include a list_head somewhere within your struct that you wish to have a list of. i.e.

How can this work? The trick is that the traversal of the list does not require knowledge of the exact data type contained within it in this method. We can just repeatedly follow the next or prev pointers to jump from list_head to list_head. However, to actually use the data contained within the list, we need to convert the list_head pointers to pointer to our data structure. This can be done with a macro similar to:

This uses the rarely used offsetof() macro within the C standard library to obtain the offset in bytes between the location pointed to by the list_head pointer, and the location of the start of the data structure.

How does this compare to the C++ method of using a list template from the standard template library? Firstly, the above method will only completely work in C. This is due to the fact that the offsetof() macro is defined to only work on plain-old-data (POD) types. Thus the above will not work on generic class types in C++. Secondly, it has quite different algorithmic properties than the template list type.

The biggest change is that of ownership. The template list type in effect "owns" the data within the list. It controls memory allocation and data layout, and this allows optimizations that cannot be done with the Linux kernel list technique. An example of this would be using an unrolled linked list algorithm, where multiple nodes are allocated adjacently in an array. This is much more cache-friendly, and allows faster iteration through the list.

The Linux kernel list technique is not without its own advantages, however. It uses a circularly linked list. Thus there is no need to have sentinel NULL pointers that need to be checked for in insertion or deletion. In addition, no memory allocation or freeing is required. Thus insertion or deletion are faster operations. However, the biggest advantage is that an object (structure) may have multiple list_head structs within itself. Since these all have different names, they may be addressed separately. This allows an object to be a member of multiple linked lists simultaneously. The flexibility of allowing multiple list membership can be a killer feature.

The Linux kernel list technique cannot be used as-is in C++, but it can be modified to work. Firstly, we will define a dlist structure as a C++ version of the list_head struct.

Where the above in good object-oriented fashion includes some of the typical operations we would like to perform on lists in addition to the raw pointers. Note how the above is not a template type. The dlist type has no knowledge of the type that contains it. This means that in order to iterate through such a list, we require more information. This can be done with a class template.

The above template requires the type of the containing type, together with the offset of the dlist struct within that type. Unfortunately, we cannot use the offsetof()macro within the template, or within the template arguments. The problem is that offsetof() uses an address-of operator, and this is not a compile-time constant. In addition, offsetof() has undefined meaning for non-POD types. To work around this, we use a function parameter to the template. The offset() function needs to return the offset we need. The nice thing is that most of the time the compiler will be smart enough to inline this function, and we obtain the optimized list iteration code wanted. If the compiler cannot determine the offset at compile time (due to virtual inheritance), this gracefully degrades to a function call.

So how might this be used in a data class? The following implements a string class that uses a dlist to store a list of strings.

The first macro replaces offsetof() with a hack that will work even for non-POD types. (Note that the dlist entry, and the offset function are within the same type, so that this works.) The second macro declares the dlist together with a static function used by the iterator template. The final template just makes declaring such an iterator easier.

The advantage of the above code is that using the dlist type does not allocate or free memory. The list information is in-line within the objects, so none extra is required. This may improve performance in some cases. In addition, the above allows multiple dlist entries in a type, allowing an object to simultaneously belong to several collections.

The above is not quite a complete implementation. Obviously, further integration with the standard template library would be useful. The one thing to note though is that the above implements a circular list. There is no begin() or end() functionality accordingly. This makes using STL algorithms somewhat annoying. A trick to avoid this problem is to add a handle dlist object. This object then can point to the beginning of the list, if one is needed. Its address also can act as the "one-beyond-the-end" functionality required by the STL.

Comments

stefan said...

Nice!!! It's exactly what I was looking for...

Ravneet said...

I don't think it is right to discuss how many botox she used or how many sergerius she did. It was her body and her choice to do that. Russian people have a saying You either speak good about deceased person or nothing . So here it is the same! Why should we discuss botox? I was accidentally forwarded to this page from the other one, but I was really shocked by the story I've read here. Taking into consideration that I am from Tallinn, I've never heard of Kart or her blog before. But, it's not so important now. All I want to say is No matter how many beauty sergerius she did, she became a hostage of the real life, and disease doesn't care if you are young, beautiful or successful, it makes its choice without taking into consideration your arguments!People, you should be more tactful, as Kart has died, so you have to allow her to rest in piece!Bruno, my condolences to you and Kart's family for your loss.God always takes the best to himself .Reply

Morten said...

You can avoid the offsetof problems by using a pointer-to-member template argument instead of the size_t argument.