Adjacent circular iterators

Note: Part way through writing this post I discovered that the boost graph library had explored many of these concepts already. I decided to finish the post anyway because I think my design process might be interesting.

One of my personal projects is a radiosity renderer. The world doesn’t need another radiosity renderer, but I wanted to learn about them, and, as usual with my personal projects, there is a secondary reason for writing it – to play around with C++ and see what I can learn.

In order to have a renderer you need to have a scene to render, which means you also need to have some data structures describing that scene. For the time being my data structure is simple – a list of polygons. Each polygon object includes various useful bits of data – things like colour and reflectivity. A polygon also has a vector of vertices. The vertices define the bounds of the polygon.

In addition to the polygons and vertices I need to process edges. An edge is defined by the two vertices it connects. My canonical data structure is the polygons and vertices, so when I need edges I need to construct them from the vertices.

There is nothing too difficult about that – iterate through the vertices and create an edge using each vertex and the following vertex. The twist comes when we get to the end of the vertices. Once we are at the last vertex, the next vertex we want is actually at the beginning of the vector. The polygon is closed so there has to be an edge between the last vertex and the first vertex. I am going to refer to this concept as “adjacent circular” – we already have algorithms like std::adjacent_find and “circular” indicates that we are linking the end to the beginning.

For the sake of simplicity I’ll get rid of the extraneous details and slim things down so that our input is std::vector< char >, and our output is std::vector< std::pair< char, char > >. In other words, we are trying to write this function:

std::vector< std::pair< char, char > >
getEdges( std::vector< char > const& vertices );

(As usual, some using statements and typedefs would make this less verbose, and, as usual I want to be explicit about where everything is coming from so I am going to spell out every component in full. Also, when I get to the point of writing classes I am going to inline the function definitions – in reality I’d split them out from the declaration).

If the input is:

a, b, c, d, e

we expect the output of getEdges to be:

(a b), (b c), (c d), (d e), (e a)

It’s always good to start off with the straightforward solution:

std::vector< std::pair< char, char > >
getEdges( std::vector< char > const& vertices)
{
    std::vector< std::pair< char, char > > edges;

    std::vector< char >::const_iterator i( std::begin( vertices ) );
    std::vector< char >::const_iterator j( std::begin( vertices ) );

    while( i != std::end( vertices ) )
    {
        ++j;
        
        if( j == std::end( vertices ) )
        {
            j = std::begin( vertices );
        }

        edges.push_back( std::make_pair( *i, *j ) );

        ++i;
    }

    return edges;
}

The code is not particularly elegant or efficient but it does the job. At the very least it’s useful as a test baseline for the other solutions we’re going to come up with.

Let’s follow Sean Parent’s dictum of “no raw loops” and see what else we can come up with. The point of “no raw loops” is not that there are no loops at all, just that the loops themselves are encapsulated in a reusable algorithm.


Let’s write ourselves an algorithm:

template<
    typename ForwardIterator,
    typename BinaryOperation >
BinaryOperation for_adjacent_circular(
    ForwardIterator first,
    ForwardIterator last,
    BinaryOperation op)
{
    ForwardIterator i( first );
    ForwardIterator j( first );
    
    while( i != last )
    {
        ++j;
        
        if( j == last )
        {
            j = first;
        }
        
        op( *i, *j );

        ++i;
    }

    return op;
}

As we’d expect, for_adjacent_circular looks very much like our original function. When we use it, it looks very much like std::for_each:

std::vector< std::pair< char, char > >
getEdgesAlgorithm( std::vector< char > const& vertices)
{
    std::vector< std::pair< char, char > > edges;

    for_adjacent_circular(
        std::begin( vertices ),
        std::end( vertices ),
        [ &edges ]( char a, char b )
        {
            edges.push_back( std::make_pair( a, b ) );
        } );

    return edges;
}

This is good as far as it goes – the problem is that it doesn’t go far enough. We wrote an algorithm that is specialized for our original problem – converting a vector of vertices into a vector of edges. Given that it’s a for_each style algorithm we can fake it to do a lot of things that we’d like to do, but it feels like we’re abusing the algorithm and falling into the trap John Potter describes of making std::for_each the “goto of algorithms”.


There are many more algorithms we might want to use – it would be nicer to have iterators that we could feed into one of the existing standard algorithms. Here’s my first pass at an iterator. I’ll display the whole thing first then go through the interesting features of it.

template< typename Iterator >
class ConstAdjacentCircularIterator
{
public:
    ConstAdjacentCircularIterator( 
        Iterator begin, 
        Iterator position, 
        Iterator end )
    : begin_( begin )
    , position_( position )
    , end_( end )
    {
    }

    typedef std::forward_iterator_tag iterator_category;
    typedef std::pair< 
        typename Iterator::value_type const&, 
        typename Iterator::value_type const& > value_type;
    typedef typename Iterator::difference_type difference_type;
    typedef value_type* pointer;
    typedef value_type& reference;

    // Prefix
    ConstAdjacentCircularIterator& operator ++()
    {
        ++position_;

        return *this;
    }

    // Postfix
    ConstAdjacentCircularIterator operator ++( int )
    {
        ConstAdjacentCircularIterator original( *this );
        ++(*this);
        return original;
    }

    value_type operator *() const
    {
        Iterator next( position_ );
        ++next;
        if( next == end_ )
        {
            next = begin_;
        }
        
        return value_type( *position_, *next );
    }

private:

    template< typename I >
    friend bool operator ==( 
        ConstAdjacentCircularIterator< I > const& a, 
        ConstAdjacentCircularIterator< I > const& b )
    {
        return 
            a.begin_ == b.begin_ &&
            a.position_ == b.position_ &&
            a.end_ == b.end_;
    }

    Iterator begin_;
    Iterator position_;
    Iterator end_;
};

template< typename Iterator >
bool operator !=( 
    ConstAdjacentCircularIterator< Iterator > const& a, 
    ConstAdjacentCircularIterator< Iterator > const& b )
{
    return !( a == b );
}

(As usual I needed to break out my copy of C++ Templates by Vandervoorde and Josuttis to get the friend declarations correct. Actually I am still not certain they are correct – I got different errors out of two different compilers and spent some time getting to what might be the right solution. I like C++ and I like templates but the syntax often has me baffled).

I have written the const version of the iterator since that’s what I need for the moment. We’ll look at the non-const version later.


ConstAdjacentCircularIterator( 
    Iterator begin, 
    Iterator position, 
    Iterator end )
: begin_( begin )
, position_( position )
, end_( end )
{
}

The constructor for our adjacent circular iterator takes other iterators as input. I assume we have some kind of underlying sequence which has its own iterators. Interestingly, we have to specify three iterators to construct an adjacent circular iterator. Not only do we specify position, we also need to specify the range of the underlying sequence with begin and end. We’ll see why these are necessary when we implement the increment functions. This is a pattern I have come across before. When I’m writing iterators that do more than just directly reference the elements of an underlying sequence I often need to specify more information about the range that the iterator is iterating over.


Moving on, we have the typedefs that all iterators must have:

typedef std::forward_iterator_tag iterator_category;

I have chosen to make this a forward iterator simply because I didn’t want to implement more than operator ++. There is no reason why we can’t implement all of the other access methods and make the circular iterator as fully featured as the underlying iterator.


typedef std::pair< 
    typename Iterator::value_type const&, 
    typename Iterator::value_type const& > value_type;

value_type is interesting, and by “interesting” I mean “the first sign that the train is eventually going to come off the rails”.

Our basic problem is that the value we want (a pair of the values stored in the underlying container) is not actually what we have stored in the underlying container. To try and work around this I am storing references to the two values that make up our pair. This looks like a decent compromise – it is a decent compromise – but it’s still a compromise.

This is going to come up again later.


typedef typename Iterator::difference_type difference_type;

We take difference_type straight from our underlying iterator – nothing surprising here.


typedef value_type* pointer;
typedef value_type& reference;

pointer and reference should not be surprising. They’re just a pointer and reference to value_type. However, we are already uneasy about value_type so we need to keep an eye on these.


// Prefix
ConstAdjacentCircularIterator& operator ++()
{
    ++position_;

    return *this;
}

Our prefix increment implementation is not surprising. I am not implementing any sort of error checking – I assume that will be handled (if it’s handled at all) by the underlying iterator. Notice that the iterator itself is not wrapping around the container – we only wrap around when we defererence the iterator. Incrementing the iterator will never take us back to the beginning of the container. While it would be possible to create an iterator that wrapped around (although the question of what end would be is an interesting one), that isn’t what we’re doing here.


// Postfix
ConstAdjacentCircularIterator operator ++( int )
{
    ConstAdjacentCircularIterator original( *this );
    ++(*this);
    return original;
}

Postfix increment is just what we’d expect.


value_type operator *() const
{
    Iterator next( position_ );
    ++next;
    if( next == end_ )
    {
        next = begin_;
    }
    
    return value_type( *position_, *next );
}

The deference function is where the magic happens. Let’s get the easy stuff out of the way – we use the current iterator and next iterator, making sure that if we’re at the end of the range we wrap around to the beginning. This is why we need to have begin and end stored in the iterator itself – we can’t implement this function without them.

We dereference position_ and next which gives us references to the underlying values, then we construct a value_type from the two references and return it.

Now let’s look at the subtleties. Our problems with value_type really bite us here. operator * should return a reference to the value type – the standard says so (section 24.2.2 in the C++11 standard). We can’t return a reference to value_type because our underlying container (whatever that is) does not contain objects of that type.

value_type contains references to the underlying objects, but that isn’t the same thing and is not enough to satisfy the standard.


template< typename I >
friend bool operator ==( 
    ConstAdjacentCircularIterator< I > const& a, 
    ConstAdjacentCircularIterator< I > const& b )
{
    return 
        a.begin_ == b.begin_ &&
        a.position_ == b.position_ &&
        a.end_ == b.end_;
}

The equality operator is necessary for us to do almost anything useful with our iterator. There is one interesting point here. There are two possible ways of writing this function. We could just compare position_, however if we do that we can generate situations like this:

ConstAdjacentCircularIterator< 
    typename std::vector< char >::const_iterator > i0( 
        vc.begin(), vc.begin() + 2, vc.end() );

ConstAdjacentCircularIterator< 
    typename std::vector< char >::const_iterator > i1( 
        vc.begin(), vc.begin() + 2, vc.begin() + 3 );

i0 and i1 have the same position – vc.begin() + 2. If we were just comparing position, i0 == i1 is true. Unfortunately, *i0 == *i1 is false.

*i0 is (c, d)

*i1 is (c, a)

It is a normal restriction for operator == to be valid only if the iterators refer to the same underlying sequence (C++11 standard section 24.2.5). Even though i0 and i1 are based off the same underlying container, they are not referring to the same sequence since they have different end values.

All of that is a long-winded way of saying that we need to compare begin_, position_, and end_.


template< typename Iterator >
bool operator !=( 
    ConstAdjacentCircularIterator< Iterator > const& a, 
    ConstAdjacentCircularIterator< Iterator > const& b )
{
    return !( a == b );
}

I define inequality in terms of equality – that’s a common technique. Since this doesn’t need to be a member function it isn’t.


Our getEdges function is now simple (at least it would be if I were using typedefs):

template< typename T >
std::vector< std::pair< typename T::value_type, typename T::value_type > >
getEdgesIterator( T const& vertices)
{
    std::vector< std::pair< 
        typename T::value_type, 
        typename T::value_type > > edges;

    ConstAdjacentCircularIterator< typename T::const_iterator > b( 
        std::begin( vertices ), 
        std::begin( vertices ), 
        std::end( vertices ) );
    ConstAdjacentCircularIterator< typename T::const_iterator > e( 
        std::begin( vertices ), 
        std::end( vertices ), 
        std::end( vertices ) );

    std::copy( b, e, std::back_inserter( edges ) );

    return edges;
}

Even better, since we now have iterators it looks like we can use any standard algorithm that requires iterators – std::for_each, std::transform, std::find etc.

I will award bonus points to anyone who spotted the weasel words in that statement – “looks like”. That’s an important caveat and we’re going to come back to it later. For the time being let’s just say that ConstAdjacentCircularIterator is doing a reasonable impersonation of an iterator.


So much for the const version of the iterator, what about the non-const version? There’s one difference – the type of value_type:

typedef std::pair< 
    typename Iterator::value_type&, 
    typename Iterator::value_type& > value_type;

This time the references to the underlying data type are non-const. We can write to an AdjacentCircularIterator as well as read from it.

For example, if we run this code (I have output operators already defined for std::vector< char > and std::pair< char, char >):

std::vector< char > vc( { 'a', 'b', 'c', 'd', 'e' } );

AdjacentCircularIterator< 
    typename std::vector< char >::iterator > i( 
        vc.begin(), 
        vc.begin(), 
        vc.end() );

++i;

std::cout << vc << "\n";
std::cout << *i << "\n";

*i = std::make_pair( 'x', 'y' );

std::cout << vc << "\n";
std::cout << *i << "\n";

We get this output:

a, b, c, d, e
(b c)
a, x, y, d, e
(x y)

What about range based for? To make range-based for work we need to create a container having begin and end methods. Since we've already defined our iterator classes this is easy:

class AdjacentCircularContainer
{
public:

    typedef AdjacentCircularIterator< 
        std::vector< char >::iterator >::value_type value_type;

    typedef value_type& reference;
    typedef value_type const& const_reference;
    typedef value_type* pointer;
    typedef value_type const* const_pointer;

    typedef AdjacentCircularIterator< 
            typename std::vector< char >::iterator > iterator;
    typedef ConstAdjacentCircularIterator< 
            typename std::vector< char >::const_iterator > const_iterator;
    
    AdjacentCircularContainer( std::vector< char >* p )
    : pContainer_( p )
    {
    }

    const_iterator begin() const
    {
        return const_iterator( 
                pContainer_->begin(), 
                pContainer_->begin(), 
                pContainer_->end() );
    }

    const_iterator end() const
    {
        return const_iterator( 
            pContainer_->begin(), 
            pContainer_->end(), 
            pContainer_->end() );
    }
    
private:
    std::vector< char > *pContainer_;
};

(I am implementing the minimum necessary here - there's more we can, and should, add if we want this to be a full blown container).

We can use our container exactly as we'd expect:

template< typename T >
std::vector< std::pair< typename T::value_type, typename T::value_type > >
getEdgesContainer( T const& vertices)
{
    std::vector< std::pair< char, char > > result;

    AdjacentCircularContainer c( &v_char );

    for( std::pair< char, char > e : c )
    {
        result.push_back( e );
    }

    return result;
}

While we're at it, now we have our container we can start to use the Adobe Source Library algorithms:

template< typename T >
std::vector< std::pair< typename T::value_type, typename T::value_type > >
getEdgesAdobe( T const& vertices)
{
    std::vector< std::pair< char, char > > result;

    AdjacentCircularContainer c( &v_char );

    adobe::copy( c, std::back_inserter( result ) );

    return result;
}

Life is good. We have iterators and we have containers that implement our desired "circular adjacent" functionality.

Except that we don't.

I have dropped unsubtle hints along the way that we've been cheating. While we have things that do reasonable impersonations of iterators and containers (good enough to do the things I need for my current use case), we do not have standard compliant iterators and containers.

The fundamental problem is the one we identified way back when we started to come up with the type of value_type. Normally with a C++ sequence there are objects of type value_type stored somewhere, and they persist for long enough that returning and storing references to them is a sensible thing to do. We don't have that. The only time we ever have an actual pair of values is as a local value inside the dereferencing function, and, as we all know, passing back a reference to a local value (which goes out of scope as soon as the function ends) is a Bad Thing.

We want (and the standard requires) the declaration of operator * to be:

value_type const& operator *() const

but we have to make it:

value_type operator *() const

There are other problems (we can't implement front or back correctly), but they all stem from the same source.

If you're thinking - "this sounds like the problems with std::vector< bool >" - you're right. Because std::vector< bool > is allowed to be specialized so that the boolean values can be stored as a bitset you can't actually get a reference to an individual bool object because the container doesn't store individual bool objects. We can fake it with proxy objects, and more modern versions of C++ can fake it better than older versions can, but we're still faking it, and sooner or later the faking will catch up with us.

For more on std::vector< bool > checkout Howard Hinnant's comments on this Stack Overflow question and this blog post.


In conclusion, it's been an interesting investigation but is it really worth it? For my particular use case the answer is "no" (at least in terms of solving my original "vertices to edges" problem rather than writing a blog post and learning something). Our original simple solution is good enough for my needs. I have it wrapped up in the function getEdges so it's easy enough to swap out if I need something different.

Once I have my edges in a container I can run whatever algorithsm I need on a container that does actually contain edges rather than some fake sequence that just pretends to contain edges.

No raw loops – output

Let’s get back to raw loops, or rather the absence of raw loops. How do we write out the contents of a container so that each element appears on its own line? Assume we have some type T and a vector of T:

class T
{
   ...
};

std::vector< T > v_T;

This is the old style way to write the contents:

std::ostringstream ostr;
for( std::vector< T >::const_iterator i( std::begin( v_T ) ); 
    i != std::end( v_T ); ++i )
{
    ostr << *i << "\n";
}

The range-based for solution is straightforward:

for( T const& t : v_T )
{
    ostr << t << "\n";
}

Lambda functions also provide a solution:

std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    [ &ostr ]( T const& t ){
        ostr << t << "\n";
    } );
adobe::for_each( 
    v_T, 
    [ &ostr ]( T const& t ){
        ostr << t << "\n";
    } );

Conveniently, output streams have iterators. We can use std::ostream_iterator as the destination for std::copy or adobe::copy :

std::copy( 
    std::begin( v_T ), 
    std::end( v_T ), 
    std::ostream_iterator< T >( 
        ostr, 
        "\n" ) );
adobe::copy( 
    v_T, 
    std::ostream_iterator< T >( 
        ostr, 
        "\n" ) );

Notice that we can pass a string (called the delimiter string in the standard) to the std::ostream_iterator constructor. The delimiter string is written out after every element.

Since std::ostream_iterator is an iterator with std::ostream as its underlying container, we might thank that we would be able to do this:

std::copy( 
    begin( v_T ), 
    end( v_T ), 
    std::begin( ostr ) );

We can't. The standard does not include a specialization of std::begin for std::ostream, and we are not allowed to add one ourselves because of Section 17.6.4.2.1 of the C++11 standard which states:

A program may add a template specialization for any standard library template to namespace std only if the declaration depends on a user-defined type and the specialization meets the standard library requirements for the original template and is not explicitly prohibited

In addition to that, std::begin does not allow for the additional delimiter string argument. Also, std::ostream_iterator is instantiated for the type of the source value which is not present in the single argument we pass to std::begin so cannot be deduced.

There is nothing stopping us writing our own function to construct a std::ostream_iterator from std::ostream. Since it's our function it can take an additional argument for the delimiter string, and so long as we don't put it into the standard namespace we can even call it begin. We do have to specify the type of our output when we call the function (the function cannot deduce it from the arguments because that type never appears in the arguments):

template< typename U >
std::ostream_iterator< U >
begin( std::ostream& ostr, const char* delimiterString )
{
    return std::ostream_iterator< U >( ostr, delimiterString );
}
std::copy( 
    begin( v_T ), 
    end( v_T ), 
    begin< T >( ostr, "\n" ) );

And an example that almost works. Since operator << is just a function, we can pass it to std::bind :

std::for_each(
    std::begin( v_T ), 
    std::end( v_T ),
    std::bind( operator <<, std::ref( ostr ), _1 ) );

This writes out the elements of the container, but it does not put each element on its own line - there is nowhere to plug the "\n" into. We can't achieve the each element appears on its own line requirement from our original problem statement.

We can write a function that takes an object of type T and a delimiter string and writes out both of them:

void
writePostText( std::ostream& o, T const& t, const char* text )
{
    o << t << text;
}
std::for_each(
    std::begin( v_T ), 
    std::end( v_T ),
    std::bind( writePostText, std::ref( ostr ), _1, "\n" ) );

Which of these solutions is the best? As usual there isn't a definitive answer to that question, but it looks like the range-based for solution looks has the lowest overhead (overhead being bits of boilerplate syntax that are not directly used to solve the problem), with adobe::for_each close behind.


There is one more thing to throw into the mix. The range-based for looks good, however it can only iterate over the entire container. Most of the time this isn't a problem, I estimate that 95% of my algorithm usage is over the entire container. In the case of writing things out though, there is a common use case where you want to iterate over a portion of the container. If you are writing out JSON, or some similar format where you need to put text between each pair of elements (i.e. the text cannot appear before the first element or after the last element), you have to be able to iterate over a sub range of the container.

A solution to the "separator text" problem might look like this:

void
writePreText( std::ostream& o, const char* text, T const& t )
{
    o << text << t;
}
if( !v_T.empty() )
{
    std::vector< T >::const_iterator i( std::begin( v_T ) ); 
    
    ostr << *i;

    ++i;
    std::for_each(
        i,
        std::cend( v_T ),
        std::bind( writePreText, std::ref( ostr ), ",\n", _1 ) );
}

We can't do this with range-based for. If we need to do both things within our source code - writing with text after each element and writing text between each element - do we want to do those two things differently - one of them using range-based for, one using a standard algorithm? There is a cost to being inconsistent, there is also a cost to being consistent but using a slightly larger solution for one of the cases.

I know that I am overanalyzing this. In practice I wrote a templated writeContainer function which lets you specify text that is written before the first element, text that is written after the last element and text that is written between elements. Problem solved and, with the exception of this blog post, I don't have to consider the subject any more.

Algorithms in action – sorting

I like to have a personal project that I use to explore different programming ideas – my current personal project is an editor. A couple of features I am adding require the use of a list – for example, the editor is working on a certain set of files and I want to be able to see the files, perhaps sorted by file name:

sort filename

Perhaps sorted by directory:

sort directory

This is completely typical list control behaviour – click on a column header to sort on that column and scroll through the list to see every element in its correct place. Of course I want this to be highly responsive, and I also want to be able to have many millions of elements in the list – I probably won’t ever have a project with a million files in it but I might well have a million symbols.

I am using a virtual list control – the list control itself does not have most of the data for the list. It knows how many elements are in the list and which block of elements it is displaying, then it calls back into my code to get the text for the elements it is displaying. The list control does not have to deal with millions of elements, it just has to deal with the elements it is currently displaying.

For my definition of “responsive” I am going to use Jakob Nielsen’s values from his article Response Times: The 3 Important Limits:

< 0.1 second for that instantaneous feeling
< 1 second to avoid interrupting the user’s flow of thought
< 10 seconds to avoid losing the user’s attention altogether.


Useful definitions

For this example we have a very simple struct containing file metadata, and a comparator class that gives us an ordering of FileMetaData objects. I have left out the details of the comparator class, but we can tell it to compare on filename, directory, size or last write time.

struct FileMetaData
{
    std::string fileName_;
    std::string directory_;
    
    std::size_t size_;
    std::time_t lastWriteTime_;
};

class Comparator
{
public:
    bool operator()( 
        FileMetaData const& a, 
        FileMetaData const& b )
}

There are also a few typedefs that will come in useful:

typedef std::vector< FileMetaData > FMDVector;
typedef FMDVector::iterator         Iter;
typedef FMDVector::reverse_iterator ReverseIter;
typedef FMDVector::size_type        UInt;

The problem

When we click on a column header the list should be resorted on the data in that column. We should be able to scroll through the list and see every list element in the correct place.


Solution #0

Since we need to sort the elements in the list why not just use std::sort? The code is simple and obvious:

void
sortAllElements( 
    Comparator const& comparator, 
    FMDVector& files )
{
    std::sort( 
        std::begin( files ), 
        std::end( files ), 
        comparator );
}

(I know that the sort barely deserves to be inside its own function but it’ll make it consistent with our more complicated sorting later).

Whenever we click on a column header we update the comparator and call sortAllElements. The code is simple and it works.

The catch comes when we look at performance. Here are the times to sort different numbers of elements:

# elements sort all
1,000 0.000s
10,000 0.000s
100,000 0.187s
1,000,000 2.480s
10,000,000 33.134s

The timings look great up to 10,000, even sorting 100,000 elements is acceptable. Once we get into the millions though the numbers get much worse. We really don’t want to be waiting around for 2 seconds, let alone 30+ seconds.

The C++ standard tells us that std::sort has a complexity of O( n * log n ). As we have just seen, we get good results with small numbers of elements, but much worse results as n gets larger.

So the straightforward sort works well for smaller numbers of elements, but we are going to need something smarter for millions of elements. There are several possibilities:

  1. Offload some of the work onto separate threads – even if we didn’t speed up the overall sort time we would at least avoid locking up the main UI thread.
  2. Use previous knowledge of the ordering to avoid having to do a complete re-sort.
  3. Keep multiple copies of the list, each sorted by a different column so that the right one can be instantly swapped in.

For the purposes of this blog post I am going to avoid #1 because threading opens up its own can of worms. I will declare that #2 is invalid because we don’t have any previous knowledge (or at least we have to have a solution that works responsively even if we don’t have any previous knowledge of the ordering). I am not going to consider #3 because I don’t want to make that particular speed / space tradeoff.

Instead I am going to take my cue from the virtual list control. It only needs to know about the elements it is displaying – those are the only elements that need to be in sorted order.

The problem has now moved from “sort the entire list” to “sort that portion of the list that is displayed”. If we can always guarantee that the portion of the list that is displayed is sorted correctly, the user will never know the difference. This has the potential to be much quicker – the number of displayed elements is relatively small. I can get around 70 lines on my monitor, let’s round that up to 100 – we need to get the 100 displayed elements correctly sorted and in their right place.


Solution #1

We’ll start with a straightforward case, the 100 elements we want to display are right at the beginning of the list (quite possibly the case when the list is first displayed). The standard library supplies us with many more sorting related algorithms than std::sort, and for this case it gives us std::partial_sort. As the name suggests, it partially sorts the list – we tell it how much of the beginning of the list we want it to sort:

void
sortInitialElements( 
    UInt k, 
    Comparator const& comparator, 
    FMDVector& files )
{
    std::partial_sort( 
        std::begin( files ), 
        std::begin( files ) + k, 
        std::end( files ), 
        comparator );
}

(There is no error handling in the function itself – I am assuming that the caller has ensured the values being passed in are within range.)

We have an additional parameter – k, the number of elements we want sorted correctly. We use k to create an iterator that we pass to std::partial_sort.

According to the standard, std::partial_sort has a complexity of O( n * log k ). Since k is likely to be much smaller than n (we’re assuming that k is 100) we should get better performance:

# elements sort all sort initial
1,000 0.000s 0.000s
10,000 0.000s 0.000s
100,000 0.187s 0.000s
1,000,000 2.480s 0.047s
10,000,000 33.134s 0.406s

This is looking much better. We are still in the instantaneous category at one million elements and even though 10 million is pushing us to around half a second it is still a vast improvement over sorting the entire list.

Moving on, it’s all very well being able to sort the beginning of the list correctly but we also want to be able to scroll through the list. We want to be able to display any arbitrary range of the list correctly.


Solution #2

Once again, the standard provides us with a useful algorithm. The function std::nth_element lets us put the nth element of a list into its correct place, and also partitions the list so that all of the elements before the nth element come before that element in sort order, and all of the elements after come after the nth element in sort order.

So, we can use std::nth_element to get the element at the beginning of the range into the correct place, then all we have to do is sort the k elements afterwards to get our range sorted. The code looks like this:

void
sortElementsInRange( 
    UInt firstElementIndex, 
    UInt k, 
    Comparator const& comparator, 
    FMDVector& files )
{
    std::nth_element( 
        std::begin( files ), 
        std::begin( files ) + firstElementIndex, 
        std::end( files ), 
        comparator );
    
    std::partial_sort( 
        std::begin( files ) + firstElementIndex, 
        std::begin( files ) + firstElementIndex + k, 
        std::end( files ), 
        comparator );
}

and the performance data:

# elements sort all sort initial sort range
1,000 0.000s 0.000s 0.000s
10,000 0.000s 0.000s 0.016s
100,000 0.187s 0.000s 0.016s
1,000,000 2.480s 0.047s 0.203s
10,000,000 33.134s 0.406s 2.340s

A million elements pushes us outside the “instantaneous” limit, but still acceptable. Unfortunately, 10 million elements isn’t great.

There is one more feature of sorting lists that I love. I use it in my email all the time. By default, my email is sorted by date – I want to see the most recent emails first. Sometimes though I’ll be looking at an email and want to see other emails from that sender. When that happens, I can click on the “sender” column header and have the email I currently have selected stay in the same place while the rest of the list sorts itself around the selected email. The selected element acts like a pivot and the rest of the lists moves around the pivot.

Mouse over this image to see an example of this sort on our list of files:


Solution #3

WARNING – the code I am about to present does not work under all circumstances. I want to focus on the sorting algorithms so for the time being I am not going to clutter the code up with all of the checks needed to make it general. DO NOT USE THIS CODE. I will write a follow up blog post presenting a correct version of this function with all of the checks in place.

Sorting around a pivot element is a more complicated function than the others so I’ll walk through it step by step.

void
sortAroundPivotElementBasic( 
    UInt nDisplayedElements, 
    UInt topElementIndex, 
    UInt pivotElementIndex, 
    Comparator const& comparator, 
    UInt& newTopElementIndex, 
    UInt& newPivotElementIndex,
    FMDVector& files )
  • nDisplayedElements The number of elements displayed on screen.
  • topElementIndex The index of the top displayed element of the list.
  • pivotElementIndex The index of the pivot element.
  • comparator Same as the other sort functions – the object that lets us order the list.
  • newTopElementIndex An output parameter – the index of the top element after sorting.
  • newPivotElementIndex The index of the pivot element after sorting.
  • files The list of files we are sorting.

Our first job is to find out the new position of the pivot element. Previously we had used std::nth_element to partition the range, this time, since we actually know the value of the element, we’ll use std::partition:

FileMetaData const pivotElement( 
    files[ pivotElementIndex ] );

Iter pivotElementIterator( 
    std::partition( 
        std::begin( files ), 
        std::end( files ), 
        std::bind( comparator, _1, pivotElement ) ) );

Notice that although we are using the same comparator we have used for all of our sort operations we are binding the pivot element to the comparator’s second argument. std::partition expects a unary predicate – it moves all elements for which the predicate returns true to the front, and all elements for which the predicate returns false to the back. It then returns an iterator corresponding to the point between the two partitions. By binding our pivot element to the second parameter of comparator we get a unary predicate that returns true if an element is less than the pivot element and false otherwise.

We know where the pivot element ends up in the list, and we also know where it ends up on the display – we want it to stay in the same position on the display. This means we can work out the range of elements that are displayed.

UInt const pivotOffset( 
    pivotElementIndex - topElementIndex );

Iter displayBegin( pivotElementIterator - pivotOffset );
Iter displayEnd( displayBegin + nDisplayedElements );

Now we can fill in two of our return values.

newTopElementIndex = displayBegin - std::begin( files );
newPivotElementIndex = 
    pivotElementIterator - std::begin( files );

To recap, we have two iterators that tell us the range of elements that are displayed – displayBegin and displayEnd. We have a third iterator that tells us where the pivot element has ended up – pivotElementIterator. We also know that the elements have been partitioned – we have two partitions, each with all the right elements, but in the wrong order. The boundary between those two partitions is the location of the pivot element.

We’ve used std::partial_sort a couple of times already, we can use it again to sort the bottom section of the display:

std::partial_sort( 
    pivotElementIterator, 
    displayEnd, 
    std::end( files ), 
    comparator );

Of course we also want to sort the top section of the display. All we need is a version of std::partial_sort that will sort the end of a list, not the beginning. The standard library doesn’t supply us with that algorithm, but it does give us a way of reversing the list. If we reverse the list above the pivot element we can use std::partition. We do this using reverse iterators:

std::partial_sort( 
    ReverseIter( pivotElementIterator ), 
    ReverseIter( displayBegin ), 
    ReverseIter( std::begin( files ) ), 
    std::bind( comparator, _2, _1 ) );

Reverse iterators are easily constructed from the forward iterators that we already have, and they plug right into algorithms just like regular iterators do. The beginning of our reversed list is at the pivot element, the end is at std::begin( files ) and the element we are sorting to is the top of the display. Notice that we had to reverse our comparator as well (and notice that reversing the comparator does not involve applying operator ! to its output, we actually need to reverse the order of the arguments).

Here’s the function in one block:

void
sortAroundPivotElementBasic( 
    UInt nDisplayedElements, 
    UInt topElementIndex, 
    UInt pivotElementIndex, 
    Comparator const& comparator, 
    UInt& newTopElementIndex, 
    UInt& newPivotElementIndex,
    FMDVector& files )
{
    FileMetaData const pivotElement( 
        files[ pivotElementIndex ] );

    Iter pivotElementIterator( 
        std::partition( 
            std::begin( files ), 
            std::end( files ), 
            std::bind( comparator, _1, pivotElement ) ) );

    UInt const pivotOffset( 
        pivotElementIndex - topElementIndex );

    Iter displayBegin( pivotElementIterator - pivotOffset );
    Iter displayEnd( displayBegin + nDisplayedElements );
    
    newTopElementIndex = displayBegin - std::begin( files );
    newPivotElementIndex = 
        pivotElementIterator - std::begin( files );
    
    std::partial_sort( 
        pivotElementIterator, 
        displayEnd, 
        std::end( files ), 
        comparator );
    
    std::partial_sort( 
        ReverseIter( pivotElementIterator ), 
        ReverseIter( displayBegin ), 
        ReverseIter( std::begin( files ) ), 
        std::bind( comparator, _2, _1 ) );
}

And the performance?

# elements sort all sort initial sort range sort around pivot
1,000 0.000s 0.000s 0.000s 0.000s
10,000 0.000s 0.000s 0.016s 0.000s
100,000 0.187s 0.000s 0.016s 0.016s
1,000,000 2.480s 0.047s 0.203s 0.094s
10,000,000 33.134s 0.406s 2.340s 1.061s

Pretty darn good, in fact better than sorting a given range. Again, once we get up to ten million elements we lose responsiveness, I think the lesson here is that once we get over one million elements we need to find some other techniques.

I want to repeat my warning from above. The code I have just presented for the pivot sort fails in some cases (and fails in an “illegal memory access” way, not just a “gives the wrong answer” way). I will write a follow up post that fixes these problems.


Wrap up

We have seen that with a little work, and some standard library algorithms, we can do distinctly better than sorting the entire list, we can increase the number of elements by 2 orders of magnitude and still get reasonable performance. Extremely large numbers of elements still give us problems though.

Is this the best we can do? It’s the best I’ve been able to come up with – if you have something better please post it in the comments, or send me a link to where you describe it.

One last thought. Earlier on I dismissed option #2 – “Use previous knowledge of the ordering to avoid having to do a complete re-sort.” by saying that I wanted algorithms that would work without prior knowledge. We now have those algorithms, and once we have applied any of them we will have some knowledge of the ordering. For example, once we have sorted a given range we can divide the data set into 3 – the first block of data has all the right elements, just in the wrong order, the second block has all of the right elements in the right order and the third block has all of the right elements in the wrong order. We can use this information as we’re scrolling through the list to make updates quicker.

No raw loops 4 – std::transform

Transform each element of a container by passing it through a function and putting the result into another container

So far we’ve only looked at one algorithm – std::for_each. That’s allowed me to lay the groundwork for using lambdas and std::bind, but there are many more algorithms avilable in the standard library. In fact, in this thread , John Potter refers to std::for_each as “the goto of algorithms” (and that is far from the worst thing he says about std::for_each).

You can use std::for_each to simulate pretty much every algorithm, but as usual, just because you can do something doesn’t mean that you should. There are more specific algorithms we can use to express our intent directly, and being able to express our intent directly when we’re writing code is a Good Thing.

In this installment we move on from just calling a function with each element, now we’re going to call a function, get a result back from that function, then store that result.

Our declarations have a new function – f_int_T – a function that takes an object of type T and returns an integer.

class T
{
   ...
};

std::vector< T > v_T;
std::vector< int > v_int;

int f_int_T( T const& t );

Here’s the old way of solving our problem:

Solution #0

std::vector< T >::iterator i;
std::vector< int >::iterator j;

for( 
    i = std::begin( v_T ), 
    j = std::begin( v_int );
    i != std::end( v_T ); 
    ++i, ++j )
{
    *j = f_int_T( *i );
}

I am assuming that our output container v_int has sufficient size for each element being assigned to it. It is easy to change this code to call push_back on the output container if necessary.

We now have two iterators – one for the source range and one for the output. We call the function f_int_T for each element of the source and we write the result of the function to the output. We don’t need to explicitly use the end of the output range – std::end( v_int ) doesn’t appear anywhere in the example.

We can implement the same algorithm using range-based for:

Solution #1

std::vector< int >::iterator j( std::begin( v_int ) );
for( T const& t : v_T )
{
    *j++ = f_int_T( t );
}

Using range-based for works, but we have to increment the output iterator j inside the body of the loop. This example would be simpler if push_back was appropriate.

So let’s introduce std::transform and see what we get:

Solution #2

std::transform( 
    std::begin( v_T ), 
    std::end( v_T ), 
    std::begin( v_int ), 
    []( T const& t ) 
    {
        return f_int_T( t );
    } );

The lambda function is overkill, but I wanted to reinforce the fact that lambdas are functions, which means they can have a return value. In this case because the lambda consists of a single return statement the compiler will automatically deduce the type of the return value (there is a way of explicitly specifying the type of the return value but I am not going into that here).

Here’s something interesting. The blocks of code in solutions #0 and #1 look like this:

{
    *j = f_int_T( *i );
}
{
    *j++ = f_int_T( t );
}

There’s an assignment in both of them and they both dereference the output iterator. The range-based for version has to increment the output iterator itself. Contrast that with the block of code in the solution #2, the lambda function:

{
    return f_int_T( t );
}

The code inside the lambda is doing nothing other than calling f_int_T and returning the result. All of the iteration, the dereferencing, the assignment is being handled by the std::transform algorithm.

As I said, the lambda is overkill, we can just use the function directly as an argument:

Solution #3

std::transform( 
    std::begin( v_T ), 
    std::end( v_T ), 
    std::begin( v_int ), 
    f_int_T );

This solution makes the intent of the code clear. We have the algorithm completely separated from the action to take each time through the loop. When I look at this code and see std::transform I know how the iteration will work. I know that the number of elements placed into the output will be the same as the number of elements in the source range. I know that no elements in the source range will be skipped. Short of exceptions I know that there will be no early exit from the loop. (Many of these arguments can be applied to std::transform with a lambda function, but I think the intent is even more pronounced here).

The function that does the transformation is entirely separate and can be tested in isolation (this is the advantage of the named function over a lambda). We have separated things that used to be commingled.

We can simplify things even more using adobe::transform:

Solution #4

adobe::transform( 
    v_T, 
    std::begin( v_int ), 
    f_int_T );

Solution #4 strips things down to their basics – we have our source, the place where the output will go and what we’re going to do to each element. There is almost no boilerplate.

Incidentally, the output range can be the same as the source range. If the function changes the value but not the type of its argument (or the return type of the function can be converted to the argument type), we can run std::transform to change every element of a container in place.

I have chosen a simple example to illustrate the use of std::transform. The function we are calling is a non-member function and has a single parameter of the correct type. Refer to part 2 and part 3 of this series for more information about using std::bind to call more complex functions.

Variation #1

My examples all assumed that we had enough space in the output container, but what if we don’t? Perhaps we are trying to build the output container from scratch or append elements to an existing container. We don’t want to have to resize the container first and incur the cost of constructing default constructed objects that are going to get overwritten. We won’t be able to resize it at all if the type we are constructing doesn’t have a default constructor, or if we don’t know how many elements we are going to put into it (e.g. we are using input iterators). What if we want to use push_back on every new element?

Solutions #0 and #1 are easy. Because they handle the dereferencing and assignment within their loop body it is easy to change them to use push_back. E.g.:

Solution #0.1

std::vector< T >::iterator i;

for( 
    i = std::begin( v_T );
    i != std::end( v_T ); 
    ++i )
{
    v_int.push_back( f_int_T( *i ) );
}

In fact this is simpler than our original solution #0 since we don’t need an iterator for the output at all.

If we want the effect of push_back when we use std::transform (or any other algorithm that writes to iterators) we use an insert iterator. An insert iterator looks like an iterator (i.e. it has the operations we would expect from an iterator – although many of them have no effect for an insert iterator) but when we assign to it, it will call a function on the container – in our case we want it to call push_back.

It’s easier to show than talk about (see Josuttis “The C++ Standard Library” 2nd edition for a proper description):

adobe::transform(
    v_T,
    std::back_inserter( v_int ),
    f_int_T );

The iterator created by std::back_inserter will call push_back every time it is assigned to. The standard library also contains std::front_inserter which will call push_front, and std::inserter which calls insert.

One drawback of push_back is that it might cause the output container’s memory to be reallocated and all of the container’s contents copied over – in fact depending on the reallocation strategy it might have to reallocate several times as elements are appended. Assuming that we know the ultimate size of the output container we can call reserve which will set aside the appropriate amount of memory but without default constructing any additional elements. This lets us use push_back and know that there will only be a single memory reallocation.

Variation #2

What if we don’t necessarily want to process every element? What if we want something like this:

std::vector< T >::iterator i;
std::vector< int >::iterator j;

for( 
    i = std::begin( v_T ), 
    j = std::begin( v_int );
    i != std::end( v_T ); 
    ++i, ++j )
{
    if( somePredicate( *i ) )
    {
        *j = f_int_T( *i );
    }
}

Some of the algorithms have “if” versions – we pass in a predicate and their action is only taken if that predicate returns true. Among others, std::copy, std::remove and std::count all have “if” variants that take predicates. There isn’t a transform_if provided in the standard, but it isn’t difficult to write one:

template<
    class InputIterator,
    class OutputIterator,
    class UnaryOperation,
    class UnaryPredicate >
OutputIterator transform_if(
    InputIterator first,
    InputIterator last,
    OutputIterator result,
    UnaryOperation op,
    UnaryPredicate test)
{
    while( first != last )
    {
        if( test( *first ) )
        {
            *result++ = op( *first );
        }
        ++first;
    }
    return result;
}

[ Edit: Guvante points out in the comments here that my implementation of transform_if does not match the for_each version. ]

To my mind, this is the real beauty of the STL – as well as providing a large set of containers and algorithms, it is extensible. The framework it provides makes it possible to add new algorithms or containers which interact seamlessly with existing parts of the STL. transform_if is a new algorithm, but because it follows existing practice it doesn’t take a lot of additional work to understand it.

No raw loops 3 – member functions

Loop over all the elements in a container and call a member function (with arguments) on each element

I want to introduce one more std::bind trick. In part 1 and part 2 the functions called were non-member functions and we were passing each element of the container into the function as an argument. In this installment we are going to call a member function on each element in the container. We’ll extend our set of declarations to include a member function on class T:

class T
{
public:
   void mf_int( int ) const;
   ...
};

std::vector< T > v_T;

We have a vector full of objects of type T, we want to call the member function mf_int on each object in that vector, and we have to pass an integer argument to mf_int (I am skipping the example where we don’t have to pass any additional arguments to the function).

Let’s start off with the “old style” solution:

Solution #0

int j( 7 );
for( std::vector< T >::iterator i( std::begin( v_T ) ); 
    i != std::end( v_T ); 
    ++i )
{
    i->mf_int( j );
}

The range-based for version of the solution is as expected:

Solution #1

for( T const& t : v_T )
{
    t.mf_int( j );
}

The lambda function version also holds no surprises:

Solution #2

std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    [ j ]( T const& t )
{
    t.mf_int( j );
} );

The std::bind version looks a little different though:

Solution #3

using namespace std::placeholders;
std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    std::bind( &T::mf_int, _1, j ) );

The std::bind statement includes something new. &T::mf_int refers to a member function of class T – the function T::mf_int. When std::bind sees a member function (as opposed to a regular function) as its first argument, it expects the second argument to be the object on which to call the member function. In this case since the second argument is _1, it will call the member function T::mf_int on every element of the container. There is an additional argument to std::bind (j) which will be passed as a parameter to T::mf_int.

Having said above that I was skipping the example where we don’t have to pass any additional arguments to the function I want to mention that example briefly in connection with std::mem_fn, another function adapter (std::bind is a function adapter).

std::mem_fn provides a shorter way to specify that you want to call a member function with no arguments. Let’s add a new declaration to class T:

class T
{
public:
   void mf_int( int ) const;
   void mf_void() const;
   ...
};

When we use std::mem_fn we don’t need to specify the placeholder _1:

std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    std::mem_fn( &T::mf_void ) );

std::mem_fn only works for member functions taking no arguments.

We can also use adobe::for_each:

Solution #4

adobe::for_each( v_T, std::bind( &T::mf_int, _1, j ) );

Wrap up

When we’re using std::bind we always specify the function we are going to call as the first argument. If the function we are going to call is a member function we specify the object on which we are going to call the member function second. Any additional parameters follow.

No raw loops 2 – arguments

Loop over all the elements in a container and call a function with each element and additional values as arguments

In part 1 of this series we looked at the simplest case we could, a case that is tailor-made for std::for_each – calling a function with each element of a container as an argument. That particular case is easy, but it obviously doesn’t cover everything so we’ll move on to a (slightly) more complex example.

Our previous problem statement was: “loop over all the elements in a container and call a function with each element as an argument”: We’ll extend the problem so that now the function doesn’t just take a single argument of the same type as the element, it takes an additional argument.

As before we have a few declarations:

class T
{
   ...
};

void f_T_int( T const& t, int i )

std::vector< T > v_T;

Notice that f_T_int takes an object of type T and an int.

First of all, our raw loop version of the solution.

Solution #0

int j( functionReturningInt() );

for( std::vector< T >::iterator i( std::begin( v_T ) ); 
    i != std::end( v_T ); 
    ++i )
{
    f_T_int( *i, j );
}

Solution #0 has the same set of pros and cons as solution #0 did in part 1.

As before, there is a range-based for solution:

Solution #1

for( T const& t : v_T )
{
    f_T_int( t, j );
}

Again, similar pros and cons to the range-based for in part 1.

How about using for_each? In part 1 , turning the explicit loop into an application of std::for_each was easy. std::for_each expects to be given a function (and I’ll include “function object” in the definition of function) that takes a single argument. In part 1 that was exactly the function we wished to call. Now, however, we want to call a function that takes an additional parameter – as well as taking an object of type T it takes an integer.

Back in the bad old days before C++11 we would set up a function object like this:

Solution #2

class C_T_int
{
public:
    C_T_int( int j ) : j_( j )
    {
    }

    void operator ()( T const& t )
    {
        f_T_int( t, j_ );
    }
private:
    int j_;
};

std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    C_T_int( j ) );

The function object C_T_int takes our additional argument (the integer) in its constructor, then defines operator () to accept the argument from our container. When operator () is called it has both arguments that it needs to be able to call f_T_int.

The function object allows us to use std::for_each, and the line of code that calls std::for_each is very straightforward, however solution #1 has a massive “con”: the overhead is horrible, there is lots of boilerplate.

If only we had some way of generating the function object without having to write all of the boilerplate. The C++11 standard provides us with exactly that: lamda functions. As the standard says “Lambda expressions provide a concise way to create simple function objects”. Using lambdas leads to:

Solution #3

std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    [ j ]( T const& t ) 
{ 
    f_T_int( t, j ); 
} );

(This is a minimal introduction to lambda functions, there is much more to them than I am going to cover. Herb Sutter’s talk Lambdas, Lambdas Everywhere has much more information.)

The lambda function consists of three parts, contained in three different types of brackets. It starts with square brackets – [] – moves on to round brackets – () – then finishes off with curly brackets – {}.

The three parts correspond to the three essential features of the function object we created for solution #2:

  1. The value(s) passed into the constructor of the function object appear in the square brackets – []
  2. The value(s) passed into operator () appear in the round brackets – ()
  3. The code in operator () appears in the curly brackets – {}

Notice the unusual looking piece of syntax at the end:

} );

That occurs because the lambda is being passed as an argument to std::for_each.

This first four solutions have one thing in common, they each contain a block of code looking something like this:

{
    f_T_int( t, j );
}

The difference is in the surrounding boilerplate. Assuming that we ignore solution #2, the hand-coded function object (which is sufficiently horrible that I feel justified in dismissing it), all of these solutions keep the code to be executed each time during the loop in the same place as the loop – they keep everything together (although if you take that to the extreme then we end up with one enormous “main” function – it’s another entry in both the “pros”” and “cons” column). The range-based for and std::for_each solutions separate out the loop from what happens each time around the loop. Looking at the code and seeing a range-based for or std::for_each tells us that exceptions and early returns aside, we will iterate over every member of the given range.

The block of code can easily be extended to do something else. As before, this falls into both “pros” and “cons”.

C++11 gives us another option, a way of taking a function with N arguments and turning it into a function with M arguments (where M < N). For those of you who think that this sounds like currying, you’re right – it’s like currying.

Solution #4

using namespace std::placeholders;
std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    std::bind( f_T_int, _1, j ) );

std::bind started life as boost::bind and was adopted into the C++11 standard. In solution #4 it takes three arguments:

  1. f_T_int – The function we ultimately want to call.
  2. _1 – The first argument to be passed to f_T_int. _1 is a special value, a placeholder. std::bind is going to produce a function object – in this case a function object that takes a single argument. That single argument (the first argument) will be passed on to f_T_int in the position that _1 is in.
  3. j – The second argument to be passed to f_T_int. In this case it’s the value of the variable j.

In this example, std::bind took a function that takes two parameters – f_T_int and turned it into a function object that takes a single parameter by binding one of the arguments to j. This single parameter function is exactly what we want for std::for_each.

(As with the lambdas, I am skipping over many, many details about std::bind.)

(Aside – I am not normally a fan of using statements, and in particular I won’t use them much in this series because I want to be very clear about where everything is coming from, however if I were to keep this rule up I would have to write std::placeholders::_1 and std::placeholders::_2 and that is too ugly even for me.)

We can also use adobe::for_each with lambda or std::bind:

Solution #5

adobe::for_each( v_T, [ j ]( T const& t ) 
{ 
    f_T_int( t, j ); 
} );

Solution #6

adobe::for_each( v_T, std::bind( f_T_int, _1, j ) );

So we have our original raw loop solution (#0) and 6 other possibilities:

  1. The raw loop – we are trying to get away from this.
  2. Range based for loop
  3. Old style C++ function object
  4. std::for_each with a lambda
  5. std::for_each with std::bind
  6. adobe::for_each with a lambda
  7. adobe::for_each with std::bind

#0 is the version we are trying to get away from, #2 has so much boilerplate that I am prepared to drop it without further consideration. std::for_each and adobe::for_each are variations on a theme, and the choice between them will depend on (a) whether you are prepared to include the Adobe Source Libraries in your project and (b) whether you want to operate over the entire container. That leaves three solutions:

  1. Range based for loop
  2. std::for_each with a lambda
  3. std::for_each with std::bind

Range-based for and lambda both have a block of code that (more or less) matches the block of code in the original raw solution:

{
    f_T_int( t, j );
}

Range-based for lambda differ in the boilerplate surrounding this block of code, although there are still similarities (for example, they both have T const& t declarations).

The main difference is that the range-based for models one type of loop – std::for_each (yes, you can achieve the same effect as other algorithms but it takes additional code). For solving this particular problem that isn’t an issue, that is exactly the type of loop we want. Later in this series we’ll be looking at other algorithms.

The fact that range-based for and lambda both have a block of code means that they can both put extra things into that block of code. In fact, they could put the definition of f_T_int right into the loop block itself. std::bind doesn’t let us do that – once we have bound a particular function the only thing that can be done each time around the loop is to call that function. Let’s look at the pros and cons of the code block vs. std::bind:

Code block pros:

  • Keeps the code together – the loop code and the action to take on each iteration is in the same place.
  • Makes it easy to change the operation that is taking place on each iteration of the loop.
  • Easy to step through in a debugger
  • Clear error messages (at least as clear as C++ error messages ever are)
  • “Normal” syntax – basically the same syntax as if we were calling the function once.

Code block cons:

  • Keeps the code together – the loop code and the action to take on each iteration is in the same place.
  • Too easy to change. Making something easy to change makes it more likely to be changed. Keeping code working as it is maintained is a problem.
  • Difficult to test the body of the loop separately from the loop itself.

std::bind pros:

  • Forces you to split the code into separate functions and manage complexity. The loop is separate from the action taken on each iteration.
  • Easy to test the body of the loop separately from the loop itself.

std::bind cons:

  • Forces you to split the code into separate functions.
  • Tricky to step through in the debugger.
  • Error messages are verbose and almost incomprehensible.

Wrap up

I like std::bind. I think that currying is an elegant solution to many problems, and I like the fact that std::bind forces me to split the looping construct from the action taken on each iteration. I have to come up with a good function name which means I have to think through the problem fully and clearly. Naming is difficult but the payoff of having a well named function is worth it. I like the fact that I can test the function executed for each element independently of the loop.

Sadly (for me anyway), I think I am in a minority of one here. Bjarne Stroustrup writes about bind:

These binders … were heavily used in the past, but most uses seem to be more easily expressed using lambdas.

The C++ Programming Language – Fourth Edition, page 967.

I think that the title of Herb Sutter’s talk Lambdas, Lambdas Everywhere is intended as an expression of optimism rather than as a dire warning.

My biggest concern with lambdas is that they’re going to be used in the same way that function bodies are now – to contain vast amounts of deeply nested code.

I took the title of this series “No raw loops” from a talk by Sean Parent at Going Native 2013. In that same talk (starting at around 29:10) he makes suggestions about when and how to use range-based for loops and lambda functions. The summary is, “keep the body short”, he suggests limiting the body to the composition of two functions with an operator. If I was confident that lambda and range-based for bodies were never going to exceed this level of complexity I would be more positive about them. For myself, having some limits on what I can do often leads to a better organized, more thought through design.

No raw loops 1 – functions

Loop over all the elements in a container and call a function with each element as an argument

In this talk at Going Native 2013 Sean Parent argues that a goal for better code should be “no raw loops”. He talks about a number of problems with raw loops, and a number of ways to address those problems. In this series I am going to look at one way of avoiding raw loops – using STL algorithms.

It seems that we often read advice to “use an STL algorithm”, but there are practical difficulties in doing so. We’re going to start off with the simple cases and move on to the more complex cases in later posts.

We need a few declarations before we start. A class T, a function that we can pass a T to, and a vector of T:

class T
{
   ...
};

void f_T( T const& );

std::vector< T > v_T;

Here’s a typical piece of code to solve the problem “loop over all the elements in a container and call a function with each element as an argument”:

Solution #0

for( 
    std::vector< T >::iterator i( std::begin( v_T ) ); 
    i != std::end( v_T ); 
    ++i )
{
    f_T( *i );
}

There is nothing particularly fancy about this code, it is a straightforward translation of the problem statement into a typical C and C++ form (I know this isn’t C code but the form is familiar to a C programmer). The most esoteric things in it are the iterator and the new std::begin and std::end functions.

I am going to use a technique in this series to enable us to compare different solutions. Given a solution we look at all of the advantages (pros) to that solution and all of the disadvantages (cons). The goal is to list everything that fits in either category – how we choose to weight each pro or con is another matter. It is my contention that you can’t fully evaluate a solution without knowing everything that is wrong with it as well as everything that is right.

Pros:

  • This is idiomatic code. Any C++ programmer should be able to produce this, and it isn’t a million miles away from the idiomatic C code to solve this problem. The translation from the problem statement to the code is simple.
  • The code can easily be extended to do something else with the iterator before calling the function.
  • The loop is easy to step through in the debugger. Stepping into the code that assigns and tests the iterator takes you into the strange world of standard library implementations, but it is easy to step into the function being called.
  • The compiler gives a reasonable error message if the function being called doesn’t handle the type it is being called with. If I try and call f_int (a function that takes an int as a parameter) instead of f_T I get this error message from VS2013:
    d:\documents\projects\c++ idioms\sourceloop_idioms.cpp(328): error C2664: 'void f_int(int)' : cannot convert argument 1 from 'T' to 'int'
    

    and this message from GCC 4.8.2:

    ../../source/loop_idioms.cpp:328:23: error: cannot convert ‘T’ to ‘int’ for argument ‘1’ to ‘void f_int(int)’
             f_int( *i );
    

    both of which point directly to the errant line (328 in loop_idioms.cpp) and identify the problem.

Cons:

  • Although translating from the problem statement to the code is easy, translating from the code back to the problem statement is more difficult. We need to check that the iterator is incremented exactly once around each loop, that it isn’t changed in any other way, and that the loop does not exit early.
  • The code can easily be extended to do something else with the iterator before calling the function. This was in the pros as well – remember that we are trying to solve the original problem statement – “loop over all the elements in a container and call a function with each elementas an argument”: We don’t want anything else inside the loop.
  • There is a lot of boilerplate. We have an additional variable i that wasn’t mentioned in the problem statement and we have a whole bunch of code dedicated to its initialization, testing, type and incrementing.

Let’s look at an alternate solution using the C++11 range-based for loops:

Solution #1

for( T const& t : v_T )
{
    f_T( t );
}

We have certainly reduced the boilerplate. There is less typing. We have removed the iterator i and the need to initialize, increment and test it. We now have t – a reference to an object of type T. We don’t have to specify std::begin and std::end, the loop automatically operates over the whole container. Stepping through the code in the debugger is straightforward and we get the same error messages if we use the wrong function. On the “cons” side, we still have the option of doing more things inside the loop, and there is a still a variable t along with its type. The use of auto would simplify this a little, we would avoid explicitly declaring the type of t (I have issues with auto but I will save those for another post).

Can we do even better? The standard library gives us std::for_each – an algorithm that seems almost perfect for what we want:

Solution #2

std::for_each( std::begin( v_T ), std::end( v_T ), f_T );

This looks nice, we have removed the iterator i which means we have removed its type, initialization, testing and incrementing. There is no need for an explicit variable t and its type (auto or otherwise). There is no way for us to do anything else inside the loop. Other than an exception being thrown the loop is not going to be terminated early (any of our solutions will exit early if an exception is thrown). All of this makes it easy to go from the code back to the problem statement. On the downside, we have to know how std::for_each works. We have moved away from the C-like code of the first solution to something that is firmly C++ and not understandable unless we have knowledge about C++ algorithms (I don’t usually regard “having to know about the standard library” as a Bad Thing but I have worked at many places where the subset of C++ used is small enough that they would dislike it).

Stepping through this in the debugger is harder work. We have to step into the implementation of std::for_each and (with the version of the libraries I have on VS2013) that takes us through a couple more internal functions before we actually call f_T. Putting a breakpoint at the start of f_T is the easiest way to track what is going on each time around the loop.

The compiler error messages also get more confusing. if I replace f_T with f_int as before I get this from GCC 4.8.2:

In file included from /usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/algorithm:62:0,
                 from ../../source/loop_idioms.cpp:6:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/bits/stl_algo.h: In instantiation of ‘_Funct std::for_each(_IIter, _IIter, _Funct) [with _IIter = __gnu_cxx::__normal_iterator<T*, std::vector >; _Funct = void (*)(int)]’:
../../source/loop_idioms.cpp:334:66:   required from here
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/bits/stl_algo.h:4417:14: error: cannot convert ‘T’ to ‘int’ in argument passing
  __f(*__first);
              ^

The basic error message cannot convert ‘T’ to ‘int’ is still present, but it is reported in a standard header (stl_algo.hpp) and we have to look at the rest of the error text to narrow down what line in our source file caused the problem.

There is one more improvement we can make. Remember that the original problem statement said “loop over all the elements in a container …”. Solution #1 (range-based for) let us work over the entire container however solution #2 (std::for) makes us specify the beginning and end of the range explicitly. Is there a way we can get the advantages of std::for_each combined with the the range-based for behavior where we can just specify the container as a whole? It turns out there is, but we have to introduce a new library to do it – the Adobe Source Libraries, available on github. Among other things, ASL introduces a set of extensions to the standard algorithms which allow a range to be expressed as a single argument. If you pass in a container you automatically operate over the full range of that container.

Solution #3

adobe::for_each( v_T, f_T );

This looks difficult to beat. Our original problem statement specified three things:

  1. loop over all the elements
  2. in a container
  3. call a function with each element as an argument

Our final solution contains exactly those three things:

  1. loop over all the elements adobe::for_each
  2. in a container v_T
  3. call a function for each element f_T

The boilerplate is minimal – whitespace and punctuation.

This looks great, but it still has a couple of entries in the “cons” column. It is even worse to step through in the debugger than the std::for_each solution. We have to step through the implementation of adobe::for_each which then calls into std::for_each, and uses std::bind to handle the call. All of these add extra layers of complication to the call stack – using VS2013 I get six stack layers between the adobe::for_each call and f_T.

adobe::for_each involves introducing an extra library into the system. I have worked on some teams who would see this as no problem at all, and others who would fight vehemently against it. Even for teams who are happy to introduce a new library there is a cost in getting legal approval.

Finally, if I substitute the wrong function (f_int rather than f_T), I get an error message that is a classic of C++ template beauty:

In file included from /usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/algorithm:62:0,
                 from ../../source/loop_idioms.cpp:6:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/bits/stl_algo.h: In instantiation of ‘_Funct std::for_each(_IIter, _IIter, _Funct) [with _IIter = __gnu_cxx::__normal_iterator<T*, std::vector >; _Funct = std::_Bind))(int)>]’:
../../adobe/adobe/algorithm/for_each.hpp:43:67:   required from ‘void adobe::for_each(InputIterator, InputIterator, UnaryFunction) [with InputIterator = __gnu_cxx::__normal_iterator<T*, std::vector >; UnaryFunction = void (*)(int)]’
../../adobe/adobe/algorithm/for_each.hpp:53:62:   required from ‘void adobe::for_each(InputRange&, UnaryFunction) [with InputRange = std::vector; UnaryFunction = void (*)(int)]’
../../source/loop_idioms.cpp:347:37:   required from here
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/bits/stl_algo.h:4417:14: error: no match for call to ‘(std::_Bind))(int)>) (T&)’
  __f(*__first);
              ^
In file included from /usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/bits/stl_algo.h:66:0,
                 from /usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/algorithm:62,
                 from ../../source/loop_idioms.cpp:6:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1280:11: note: candidates are:
     class _Bind<_Functor(_Bound_args...)>
           ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1351:2: note: template _Result std::_Bind<_Functor(_Bound_args ...)>::operator()(_Args&& ...) [with _Args = {_Args ...}; _Result = _Result; _Functor = void (*)(int); _Bound_args = {std::_Placeholder}]
  operator()(_Args&&... __args)
  ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1351:2: note:   template argument deduction/substitution failed:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1347:37: error: cannot convert ‘T’ to ‘int’ in argument passing
  = decltype( std::declval<_Functor>()(
                                     ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1365:2: note: template _Result std::_Bind<_Functor(_Bound_args ...)>::operator()(_Args&& ...) const [with _Args = {_Args ...}; _Result = _Result; _Functor = void (*)(int); _Bound_args = {std::_Placeholder}]
  operator()(_Args&&... __args) const
  ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1365:2: note:   template argument deduction/substitution failed:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1361:53: error: cannot convert ‘T’ to ‘int’ in argument passing
          typename add_const<_Functor>::type>::type>()(
                                                     ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1379:2: note: template _Result std::_Bind<_Functor(_Bound_args ...)>::operator()(_Args&& ...) volatile [with _Args = {_Args ...}; _Result = _Result; _Functor = void (*)(int); _Bound_args = {std::_Placeholder}]
  operator()(_Args&&... __args) volatile
  ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1379:2: note:   template argument deduction/substitution failed:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1375:70: error: cannot convert ‘T’ to ‘int’ in argument passing
                        typename add_volatile<_Functor>::type>::type>()(
                                                                      ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1393:2: note: template _Result std::_Bind<_Functor(_Bound_args ...)>::operator()(_Args&& ...) const volatile [with _Args = {_Args ...}; _Result = _Result; _Functor = void (*)(int); _Bound_args = {std::_Placeholder}]
  operator()(_Args&&... __args) const volatile
  ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1393:2: note:   template argument deduction/substitution failed:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1389:64: error: cannot convert ‘T’ to ‘int’ in argument passing
                        typename add_cv<_Functor>::type>::type>()(
                                                                ^

The original error message cannot convert ‘T’ to ‘int’ is present in that block of text – in fact it is present several times. However, tracing back to the line of our code that caused it is non-trivial, although as I get more experience with algorithms I find that I am getting better at interpreting these messages. I am not sure whether this is an achievement that I should be proud of.

Despite the debugger and error message problems I still like and use the adobe::for_each solution (in the language of pros and cons, the pros outweigh the cons for me). It is very easy to reason about (in Sean Parent’s talk he often refers to the ability to reason about code – please watch the talk, it really is very informative and challenging). If I need to step through the loop in the debugger I put a breakpoint into the function being called. As for the error messages, as I said I am getting better at interpreting them, so I just suck it up and deal with them.