11.4. The Unordered Containers

Advanced

The new standard defines four unordered associative containers. Rather than using a comparison operation to organize their elements, these containers use a hash function and the key type’s == operator. An unordered container is most useful when we have a key type for which there is no obvious ordering relationship among the elements. These containers are also useful for applications in which the cost of maintaining the elements in order is prohibitive.

C++11

Although hashing gives better average case performance in principle, achieving good results in practice often requires a fair bit of performance testing and tweaking. As a result, it is usually easier (and often yields better performance) to use an ordered container.

TIP

Use an unordered container if the key type is inherently unordered or if performance testing reveals problems that hashing might solve.

Using an Unordered Container

Aside from operations that manage the hashing, the unordered containers provide the same operations (find, insert, and so on) as the ordered containers. That means that the operations we’ve used on map and set apply to unordered_map and unordered_set as well. Similarly for the unordered versions of the containers that allow multiple keys.

As a result, we can usually use an unordered container in place of the corresponding ordered container, and vice versa. However, because the elements are not stored in order, the output of a program that uses an unordered container will (ordinarily) differ from the same program using an ordered container.

For example, we can rewrite our original word-counting program from § 11.1 (p. 421) to use an unordered_map:

c++

// count occurrences, but the words won't be in alphabetical order
unordered_map<string, size_t> word_count;
string word;
while (cin >> word)
    ++word_count[word]; // fetch and increment the counter for word
for (const auto &w : word_count) // for each element in the map
     // print the results
     cout <<  w.first << " occurs " << w.second
          << ((w.second > 1) ? " times" : " time") << endl;

The type of word_count is the only difference between this program and our original. If we run this version on the same input as our original program,

containers. occurs 1 time
use occurs 1 time
can occurs 1 time
examples occurs 1 time
...

we’ll obtain the same count for each word in the input. However, the output is unlikely to be in alphabetical order.

Managing the Buckets

The unordered containers are organized as a collection of buckets, each of which holds zero or more elements. These containers use a hash function to map elements to buckets. To access an element, the container first computes the element’s hash code, which tells which bucket to search. The container puts all of its elements with a given hash value into the same bucket. If the container allows multiple elements with a given key, all the elements with the same key will be in the same bucket. As a result, the performance of an unordered container depends on the quality of its hash function and on the number and size of its buckets.

The hash function must always yield the same result when called with the same argument. Ideally, the hash function also maps each particular value to a unique bucket. However, a hash function is allowed to map elements with differing keys to the same bucket. When a bucket holds several elements, those elements are searched sequentially to find the one we want. Typically, computing an element’s hash code and finding its bucket is a fast operation. However, if the bucket has many elements, many comparisons may be needed to find a particular element.

The unordered containers provide a set of functions, listed in Table 11.8, that let us manage the buckets. These members let us inquire about the state of the container and force the container to reorganize itself as needed.

Table 11.8. Unordered Container Management Operations

Bucket Interface

Code	Description
`c.bucket_count()`	Number of buckets in use.
`c.max_bucket_count()`	Largest number of buckets this container can hold.
`c.bucket_size(n)`	Number of elements in the nth bucket.
`c.bucket(k)`	Bucket in which elements with key `k` would be found.

Bucket Iteration

Code	Description
`local_iterator`	Iterator type that can access elements in a bucket.
`const_local_iterator`	`const` version of the bucket iterator.
`c.begin(n), c.end(n)`	Iterator to the first, one past the last element in bucket `n`.
`c.cbegin(n), c.cend(n)`	Returns `const_local_iterator`.

Hash Policy

Code	Description
`c.load_factor()`	Average number of elements per bucket. Returns `float`.
`c.max_load_factor()`	Average bucket size that `c` tries to maintain. `c` adds buckets to keep `load_factor <= max_load_factor`. Returns `float`.
`c.rehash(n)`	Reorganize storage so that `bucket_count >= n` and `bucket_count > size/max_load_factor`.
`c.reserve(n)`	Reorganize so that `c` can hold `n` elements without a `rehash`.

Requirements on Key Type for Unordered Containers

By default, the unordered containers use the == operator on the key type to compare elements. They also use an object of type hash<key_type> to generate the hash code for each element. The library supplies versions of the hash template for the built-in types, including pointers. It also defines hash for some of the library types, including strings and the smart pointer types that we will describe in Chapter 12. Thus, we can directly define unordered containers whose key is one of the built-in types (including pointer types), or a string, or a smart pointer.

However, we cannot directly define an unordered container that uses a our own class types for its key type. Unlike the containers, we cannot use the hash template directly. Instead, we must supply our own version of the hash template. We’ll see how to do so in § 16.5 (p. 709).

Instead of using the default hash, we can use a strategy similar to the one we used to override the default comparison operation on keys for the ordered containers (§ 11.2.2, p. 425). To use Sales_data as the key, we’ll need to supply functions to replace both the == operator and to calculate a hash code. We’ll start by defining these functions:

c++

size_t hasher(const Sales_data &sd)
{
    return hash<string>()(sd.isbn());
}
bool eqOp(const Sales_data &lhs, const Sales_data &rhs)
{
    return lhs.isbn() == rhs.isbn();
}

Our hasher function uses an object of the library hash of string type to generate a hash code from the ISBN member. Similarly, the eqOp funciton compares two Sales_data objects by comparing their ISBNs.

We can use these functions to define an unordered_multiset as follows

c++

using SD_multiset = unordered_multiset<Sales_data,
                    decltype(hasher)*, decltype(eqOp)*>;
// arguments are the bucket size and pointers to the hash function and equality operator
SD_multiset bookstore(42, hasher, eqOp);

To simplify the declaration of bookstore we first define a type alias (§ 2.5.1, p. 67) for an unordered_multiset whose hash and equality operations have the same types as our hasher and eqOp functions. Using that type, we define bookstore passing pointers to the functions we want bookstore to use.

If our class has its own == operator we can override just the hash function:

c++

// use FooHash to generate the hash code; Foo must have an == operator
unordered_set<Foo, decltype(FooHash)*> fooSet(10, FooHash);

INFO

Exercises Section 11.4

Exercise 11.37: What are the advantages of an unordered container as compared to the ordered version of that container? What are the advantages of the ordered version?

Exercise 11.38: Rewrite the word-counting (§ 11.1, p. 421) and word-transformation (§ 11.3.6, p. 440) programs to use an unordered_map.

Chapter 2. Variables and Basic Types

Chapter 3. Strings, Vectors, and Arrays

Chapter 4. Expressions

Chapter 5. Statements

Chapter 6. Functions

Chapter 7. Classes

Chapter 8. The IO Library

Chapter 9. Sequential Containers

Chapter 10. Generic Algorithms

Chapter 11. Associative Containers

Chapter 12. Dynamic Memory

Chapter 13. Copy Control

Chapter 14. Overloaded Operations and Conversions

Chapter 15. Object-Oriented Programming

Chapter 16. Templates and Generic Programming

Chapter 17. Specialized Library Facilities

Chapter 18. Tools for Large Programs

Chapter 19. Specialized Tools and Techniques

A.2. A Brief Tour of the Algorithms

A.3. Random Numbers