This article also appeared in the ACCU Overload journal.
Motivation
By default, the C++ list container allocates memory with global operator new when elements are added to the list, and frees memory with global operator delete when elements are removed. While this provides a convenient interface for most applications, high performance and long lived applications could benefit from a pooled memory allocation strategy, where all memory used by the list is preallocated at list creation time.
Most default implementations of operators new and delete resolve to a direct call to the underlying system heap. This has the following
drawbacks:
- Allocation for list insertions can fail indeterminately.
- System heap operations create contention between otherwise unrelated threads.
- System heap operations require an often expensive system call.
- Frequent allocations and deallocations of small objects can result in fragmentation, and inefficient use of memory.
In a 2003 article for Dr. Dobb's Journal, Kevlin Henney says this regarding collection memory management:
If you are serious about managing the allocation of a container, then get serious and manage it: Write your own container type.
The standard list allocator interface is the logical starting point for implementing a pooled list container, but as Pablo Halpern noted in his
proposal to the C++ standard committee, there some inconsistencies in the standard regarding allocators. Even if a similar allocation model could be achieved with standard allocators, the intention of the pooled_list is to explore Henney's proposal. As shown by the provided test program, when using g++ on Linux the result is a container which performs inserts and erase operations in half the time of the standard list.
The Interface
The pooled_list class provides a simple and familiar interface to C++ developers. With few a exceptions pooled_lists behave similarly to standard lists. When using pooled_list, the user must first create a pool as follows:
size_t pool_size = 256;
pooled_list<int>::pool list_pool(pool_size);
The pool allocates memory for both the element data and the list's internal structure.No additional allocations will be made after the list_pool has been constructed.
One or more lists are attached to the pool:
pooled_list<int> my_list(list_pool);
The user can then operate on the list just as a standard list.
For instance:
my_list.push_back(5);
my_list.pop_front();
When the pool is depleted and an insert operation is attempted, pooled_list throws std::bad_alloc. Since the pooled_list takes a reference to a pool, pools must have a lifetime greater than all the lists which are created from it. If a pool is destroyed before the lists which are created from it, the behavior is undefined, but a segmentation fault is a likely result.
The semantics of operations which act on multiple lists are affected by pooling. The operations including all the overloads of merge() and splice() require that pooled_lists are allocated from the same pool. The condition is checked with an assert and violation results in undefined behavior. While other semantics for the functions could be defined, they all require compromise. This requirement is simple to implement and explain to users, which I believe makes it a reasonable choice.
To provide the strong exception guarantee, the assignment operator is implemented by creating a temporary copy of the right hand list, and then swapping the temporary with the left hand list. This requires that the right hand side list's pool contains at least as many free elements as are used by the right hand side list, even though fewer elements maybe required upon completion of the operation.
In explicit terms, the assignment operation involves three lists: the list being assigned from (right), the initial list being assigned to (left), and the final state of the list being assigned to (left'). Following the operation both left' and right will use the same pool, even if left used a different pool. All elements will be allocated from the pool used by right. The assignment operator, again, to provide the strong exception guarantee, creates left' (as a copy of right) before left is destroyed. For the operation to succeed, the pool used by right must contain at least the number of elements in right. When right contains more elements than left, and uses the same pool as left, it is not sufficient for the pool used by right to contain right minus left number of elements, even though that number of elements will be used (in conjunction by left' and right) after the operation completes.
Concurrency
The global heap is a resource which is shared among all threads, and access to the heap is typically serialized by the underlying operating system. This can cause contention between threads which would otherwise be unrelated. The standard list's insert and erase operations require heap access, and hence can cause contention. Consider the following example. A program contains two threads. Each thread creates a list which is only accessed by that thread. While it is safe for either thread to insert or erase elements from its respective list, access to the heap is serialized by those operations. The result is reduced performance even though there is no data shared between the threads. Pooled_lists do not access the heap directly after the pool is created, so there is no contention between lists.
Like the STL, the pooled_list class makes few guarantees about thread safety. There is no thread synchronization in the pooled_list class, hence multiple threads can not concurrently insert or erase elements in lists which share the same pool.
This requires that users externally synchronize list operations when one or more lists which use the same pool are accessed from multiple threads. If pooled_lists do not share pools, there is no need to synchronize access to pooled_lists.
The Implementation
Link lists impose storage overhead beyond contiguous arrays due to their node structure. Typically a list node holds pointers to the next node in the list and the previous node in the list. A list node could be defined as follows:
template<typename T>
struct node{
node(T initial_value):element_(initial_value){}
node* next_;
node* previous_;
T element_;
}
List allocation requires providing not only memory for the element data, but the list node structure as well. This is why the specification requires that allocators provide a rebind() implementation, which allows them to allocate node types. Providing allocators that only return elements of type T is insufficient. In classical C linked list implementations, such as the one provided by the Linux kernel, the user provides user allocated nodes at insertion time. With the C++ standard list, nodes are abstracted from the user.
The goal of the pooled_list implementation was to preallocate not only element data, but the node data, and hence all the memory
used by the list. This can be achieved by implementing the list from scratch -- creating new node types and implementing all the
list operations. For the sake of expediency and correctness, I chose a hybrid approach which uses the standard list itself as
the underlying structure for the pooled_list.
The pooled_list implementation uses two standard lists: the free and active lists. At creation the free list contains n number of elements, and the active list is empty. When elements are inserted the required nodes are moved from the free list to the active list. When elements are erased, the released nodes are moved the active list to the free list. This accomplished with the standard list's splice() operation, which allows lists to be combined in constant time without memory allocation. While this is a rudimentary implementation, it does offer some challenges in correctly providing value semantics.
A Naive Implementation
The structure of my first attempted implementation was similar to the following:
template<typename T>
class pooled_list
{
...
private:
std::list<T> free_list_;
std::list<T> active_list_;
};
While this might work for basic types and pointers, it results in the constructors for objects being called when the free list is created, not when elements are inserted. Likewise, destructors for the elements are called when the free list is deleted and not when elements are erased. To solve this problem the underlying lists must contain unstructured data, which leads to the next attempt:
template<typename T>;
class pooled_list
{
...
iterator insert(iterator pos, const T& value)
{
if(!free_list_->empty()){
// use inplace new on the raw data, and splice
// from the free list to the active list
...
}
else{
throw std::bad_alloc();
}
return --pos;
}
...
private:
std::list<char[sizeof(T)]> free_list_;
std::list<char[sizeof(T)]> active_list_;
};
By using standard lists of arrays of char the construction/destruction of elements can be constrained to insert and erase operations, properly implementing value semantics. Unfortunately this leads to some subtle alignment problems. Internally to the standard list, the char array is a member of the node, as shown in the node code example above, and C++ does not guarantee all types T will be properly aligned. Stephen Cleary provides further discussion of alignment in his
documentation for the boost pool library.
Lists of type std::list<char*> are used by the final implementation which is based on the following from Cleary's discussion:
Any block of memory allocated as an array of characters through operator new[] (hereafter referred to as the block) is properly aligned for any object of that size or smaller
For example:
template<typename T>;
class pooled_list
{
...
pooled_list():free_pool_(pool_size, 0)
{
std::list<char*>::iterator cur;
std::list<char*>::iterator end = free_list_.end();
for(cur = free_list_.begin(); cur != end; ++cur){
*cur = new char[sizeof(T)];
}
}
...
iterator insert(iterator pos, const T& value)
{
if(!free_list_->empty()){
// use inplace new on the raw data, and splice
// from the free list to the active list
...
}
else{
throw std::bad_alloc();
}
return --pos;
}
...
private:
std::list<char*> free_list_;
std::list<char*> active_list_;
};
The final implementation differs slightly in that the free_list_ is moved a separate pool class which allows it to be shared by multiple pooled_lists. The alignment workaround does impose one pointer's worth of space overhead per element for each node used in free and active lists. This could be avoided by developing a custom list rather than using the standard list as a base implementation.
List Iterators
Since the underlying list is of type std::list<char*> and not
std::list
template<typename T>
class pooled_list
{
...
//
// Functor to convert iterator to underlying data type to type T.
class reinterpret_raw_data_ptr
{
public:
typedef T& result_type;
T& operator()(raw_data_type& raw_data) const
{
return *reinterpret_cast<T*>(raw_data);
}
};
typedef char* raw_data_type;
typedef std::list<raw_data_type> raw_data_ptr_list;
typedef typename std::list<raw_data_type>::iterator raw_data_ptr_list_it;
typedef boost::transform_iterator<reinterpret_raw_data_ptr, typename raw_data_ptr_list::iterator, T&, T > iterator;
...
};
Potential Enhancements
As noted in the threading section, multiple threads can not concurrently insert or erase elements in lists which share the same pool. I chose to not impose the overhead of thread synchronization by default. I do not recommend sharing pools across threads, but this could be supported by adding a synchronization policy to the pool with no additional overhead for the default case.
Element data is allocated by the pool using operator new[]. This might not be sufficient for all use cases (for instance if the user wants to allocate element from an arena or global pool). This could be also be addressed by adding an allocation strategy to the pool. It should be noted that it would difficult to change the allocation strategy of the node structures, because the standard list is used as the underlying data structure. Providing an alternate strategy to allocate list nodes would require a reimplementation of the list structure.
Acknowledgments
Thanks to C++ guru Thomas Becker for discussion on data alignment and help with const_iterator syntax.