CompactSet and CompactMap are space-efficient associative collections that are drop-in replacements for std::set
and std::map
, for most uses. The collections are implemented using a hybrid table/tree structure described in C/C++ Users Journal (July 2003 issue). Note that there are some minor performance enhancements in the actual code that were not described in the article, which focused mainly on the basic algorithms.
Features include:
std::set
and std::map
The compact collections have been tested with the following compilers:
The compact collections use a hybrid structure that uses a tree-based collection (std::set
or std::map)
for newly inserted elements, and a vector-based collection (sorted std::vector
or std::deque)
for "old" elements. As the collection grows, elements are copied from the tree-based collection into the more space efficient vector-based collection, in a process called compaction.
The use of the tree-based collection guarantees that inserts occur in O(log(N)) time. Compaction occurs in geometric increments of the collection size, so that the cost of compaction is amortized O(1), identical to the expansion cost of std::vector. The total insert cost is therefore amortized O(log(N)).
The per-element overhead for the vector-based collection is almost 0, whereas the per-element overhead for std::set/std::map
is at least 12 bytes (plus dynamic memory overhead). For a 4-byte value type (i.e. std::set<int>
), the cooresponding compact set consumes about one third the memory of the standard version.
Note: Policies are not available with g++ 2.95 or Microsoft Visual Studio 6.0.
Policies provide an additional level of flexibility for customizing compact collections. There are two supplied compact policies supplied with the base code:
std::vector
as the table-based collection. This version is almost as fast as the standard tree-based collections. However, during compaction, the space overhead is effectively doubled, as the std::vector
is regenerated. This is a transient effect, but can be problematic for configurations where memory is very tight.std::deque
as the table-based collection. This version is significantly slower than the std::vector
-based version. But it has the benefit that compactions can occur with very little space overhead, because the original table is destroyed as the new one is created. (Note: this is only true if the std::deque
implementation releases memory as elements are erased, which may not be the case for some versions of the standard library). The MinimizeSpacePolicy may also be useful for extremely large collections, where a contiguous memory block (required by std::vector)
cannot be allocated.compactionFactor
- determines how frequently compactions occur, expressed as a percentage of the collection size. smallVectorThreshold
- for very small collections, the cost of inserting directly into the table-based collection is small. Therefore, the first smallVectorThreshold
elements are inserted directly into the table-based collection.SortedArrayBase
(defined in CompactColl.h)
The compact collections are interface compatible with their standard associative collection counterparts. However, there are some minor deviations from the standard spec. These deviations are listed below:
insert()
and erase()
operate in amortized O(log(N)) time. The standard versions are straight O(log(N)). This essentially means that individual insertions (or erasures) in a compact collection may consume more than O(log(N)) , but that the average complexity for all insertions/erasures will approach O(log(N)) . This is analogous to the amortized O(1) cost of std::vector
back-inserts.erase()
does not immediately destroy the contained value, and the erased object must remain valid until its destructor is called (during the next collection compaction). This can be problematic for objects that do not manage their own resources. For example, an object with a raw pointer where memory management is not handled by the destructor will not work with the compact collections (unless erase()
is never called). All primitive types and all standard value types (including standard collections, as long as the collection value type is itself a pure value type that manages its own resources) are appropriate as values in a compact collection.insert()/erase()
in the compact collections, unlike std::set/std::map
, whose iterators are valid for the life of the collection (as long as the underlying element is not removed).operator<
()
on the value type, once per increment/decrement (O(1) ). The standard versions do not call operator<
()
(O(0) ).
std::string)
: the maximum space savings are achieved for small types such as these.
erase()
requires objects to remain valid until destroyed by the collection (which generally does not occur until the next compaction) insert()/erase()
calls) The compact collections are designed to be thread safe, using the commonly accepted standard for thread safety (multiple readers on the same instance, multiple writers on different instances).
Note 1: Thread safety assumes an underlying standard library that is thread safe.
Note 2: No actual thread testing has been done on the compact collections.
For the MaximizeSpeedPolicy, I have made an attempt to provide the strong guarantee, but more analysis needs to be done before I can assert that it is. At the very least, I am confident that the compaction itself is strongly exception safe (provided that the underlying standard library is exception safe as well)
For the MinimizeSpacePolicy, the goal of preserving memory at all costs is at odds with the strong guarantee. Since the compaction destroys the old as it rebuilds the new, it is difficult to provide any exception safety guarantees. Therefore, it is recommended that the MinimizeSpacePolicy only be used with value types that do not throw on copy, assignment, or operator<.
The distribution contains the following source files:
CompactColl
, and both policies are defined here. The base class for the policies is SortedArrayBase
. CompactMap
as an adapter around CompactColl
CompactSet
as an adapter around CompactColl
There is currently no namespace, but I plan to add an optional namespace in the next release.
Please send bug reports to mcarrato@hotmail.com