No raw loops 3 – member functions

Loop over all the elements in a container and call a member function (with arguments) on each element

I want to introduce one more std::bind trick. In part 1 and part 2 the functions called were non-member functions and we were passing each element of the container into the function as an argument. In this installment we are going to call a member function on each element in the container. We’ll extend our set of declarations to include a member function on class T:

class T
{
public:
   void mf_int( int ) const;
   ...
};

std::vector< T > v_T;

We have a vector full of objects of type T, we want to call the member function mf_int on each object in that vector, and we have to pass an integer argument to mf_int (I am skipping the example where we don’t have to pass any additional arguments to the function).

Let’s start off with the “old style” solution:

Solution #0

int j( 7 );
for( std::vector< T >::iterator i( std::begin( v_T ) ); 
    i != std::end( v_T ); 
    ++i )
{
    i->mf_int( j );
}

The range-based for version of the solution is as expected:

Solution #1

for( T const& t : v_T )
{
    t.mf_int( j );
}

The lambda function version also holds no surprises:

Solution #2

std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    [ j ]( T const& t )
{
    t.mf_int( j );
} );

The std::bind version looks a little different though:

Solution #3

using namespace std::placeholders;
std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    std::bind( &T::mf_int, _1, j ) );

The std::bind statement includes something new. &T::mf_int refers to a member function of class T – the function T::mf_int. When std::bind sees a member function (as opposed to a regular function) as its first argument, it expects the second argument to be the object on which to call the member function. In this case since the second argument is _1, it will call the member function T::mf_int on every element of the container. There is an additional argument to std::bind (j) which will be passed as a parameter to T::mf_int.

Having said above that I was skipping the example where we don’t have to pass any additional arguments to the function I want to mention that example briefly in connection with std::mem_fn, another function adapter (std::bind is a function adapter).

std::mem_fn provides a shorter way to specify that you want to call a member function with no arguments. Let’s add a new declaration to class T:

class T
{
public:
   void mf_int( int ) const;
   void mf_void() const;
   ...
};

When we use std::mem_fn we don’t need to specify the placeholder _1:

std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    std::mem_fn( &T::mf_void ) );

std::mem_fn only works for member functions taking no arguments.

We can also use adobe::for_each:

Solution #4

adobe::for_each( v_T, std::bind( &T::mf_int, _1, j ) );

Wrap up

When we’re using std::bind we always specify the function we are going to call as the first argument. If the function we are going to call is a member function we specify the object on which we are going to call the member function second. Any additional parameters follow.

No raw loops 2 – arguments

Loop over all the elements in a container and call a function with each element and additional values as arguments

In part 1 of this series we looked at the simplest case we could, a case that is tailor-made for std::for_each – calling a function with each element of a container as an argument. That particular case is easy, but it obviously doesn’t cover everything so we’ll move on to a (slightly) more complex example.

Our previous problem statement was: “loop over all the elements in a container and call a function with each element as an argument”: We’ll extend the problem so that now the function doesn’t just take a single argument of the same type as the element, it takes an additional argument.

As before we have a few declarations:

class T
{
   ...
};

void f_T_int( T const& t, int i )

std::vector< T > v_T;

Notice that f_T_int takes an object of type T and an int.

First of all, our raw loop version of the solution.

Solution #0

int j( functionReturningInt() );

for( std::vector< T >::iterator i( std::begin( v_T ) ); 
    i != std::end( v_T ); 
    ++i )
{
    f_T_int( *i, j );
}

Solution #0 has the same set of pros and cons as solution #0 did in part 1.

As before, there is a range-based for solution:

Solution #1

for( T const& t : v_T )
{
    f_T_int( t, j );
}

Again, similar pros and cons to the range-based for in part 1.

How about using for_each? In part 1 , turning the explicit loop into an application of std::for_each was easy. std::for_each expects to be given a function (and I’ll include “function object” in the definition of function) that takes a single argument. In part 1 that was exactly the function we wished to call. Now, however, we want to call a function that takes an additional parameter – as well as taking an object of type T it takes an integer.

Back in the bad old days before C++11 we would set up a function object like this:

Solution #2

class C_T_int
{
public:
    C_T_int( int j ) : j_( j )
    {
    }

    void operator ()( T const& t )
    {
        f_T_int( t, j_ );
    }
private:
    int j_;
};

std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    C_T_int( j ) );

The function object C_T_int takes our additional argument (the integer) in its constructor, then defines operator () to accept the argument from our container. When operator () is called it has both arguments that it needs to be able to call f_T_int.

The function object allows us to use std::for_each, and the line of code that calls std::for_each is very straightforward, however solution #1 has a massive “con”: the overhead is horrible, there is lots of boilerplate.

If only we had some way of generating the function object without having to write all of the boilerplate. The C++11 standard provides us with exactly that: lamda functions. As the standard says “Lambda expressions provide a concise way to create simple function objects”. Using lambdas leads to:

Solution #3

std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    [ j ]( T const& t ) 
{ 
    f_T_int( t, j ); 
} );

(This is a minimal introduction to lambda functions, there is much more to them than I am going to cover. Herb Sutter’s talk Lambdas, Lambdas Everywhere has much more information.)

The lambda function consists of three parts, contained in three different types of brackets. It starts with square brackets – [] – moves on to round brackets – () – then finishes off with curly brackets – {}.

The three parts correspond to the three essential features of the function object we created for solution #2:

  1. The value(s) passed into the constructor of the function object appear in the square brackets – []
  2. The value(s) passed into operator () appear in the round brackets – ()
  3. The code in operator () appears in the curly brackets – {}

Notice the unusual looking piece of syntax at the end:

} );

That occurs because the lambda is being passed as an argument to std::for_each.

This first four solutions have one thing in common, they each contain a block of code looking something like this:

{
    f_T_int( t, j );
}

The difference is in the surrounding boilerplate. Assuming that we ignore solution #2, the hand-coded function object (which is sufficiently horrible that I feel justified in dismissing it), all of these solutions keep the code to be executed each time during the loop in the same place as the loop – they keep everything together (although if you take that to the extreme then we end up with one enormous “main” function – it’s another entry in both the “pros”” and “cons” column). The range-based for and std::for_each solutions separate out the loop from what happens each time around the loop. Looking at the code and seeing a range-based for or std::for_each tells us that exceptions and early returns aside, we will iterate over every member of the given range.

The block of code can easily be extended to do something else. As before, this falls into both “pros” and “cons”.

C++11 gives us another option, a way of taking a function with N arguments and turning it into a function with M arguments (where M < N). For those of you who think that this sounds like currying, you’re right – it’s like currying.

Solution #4

using namespace std::placeholders;
std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    std::bind( f_T_int, _1, j ) );

std::bind started life as boost::bind and was adopted into the C++11 standard. In solution #4 it takes three arguments:

  1. f_T_int – The function we ultimately want to call.
  2. _1 – The first argument to be passed to f_T_int. _1 is a special value, a placeholder. std::bind is going to produce a function object – in this case a function object that takes a single argument. That single argument (the first argument) will be passed on to f_T_int in the position that _1 is in.
  3. j – The second argument to be passed to f_T_int. In this case it’s the value of the variable j.

In this example, std::bind took a function that takes two parameters – f_T_int and turned it into a function object that takes a single parameter by binding one of the arguments to j. This single parameter function is exactly what we want for std::for_each.

(As with the lambdas, I am skipping over many, many details about std::bind.)

(Aside – I am not normally a fan of using statements, and in particular I won’t use them much in this series because I want to be very clear about where everything is coming from, however if I were to keep this rule up I would have to write std::placeholders::_1 and std::placeholders::_2 and that is too ugly even for me.)

We can also use adobe::for_each with lambda or std::bind:

Solution #5

adobe::for_each( v_T, [ j ]( T const& t ) 
{ 
    f_T_int( t, j ); 
} );

Solution #6

adobe::for_each( v_T, std::bind( f_T_int, _1, j ) );

So we have our original raw loop solution (#0) and 6 other possibilities:

  1. The raw loop – we are trying to get away from this.
  2. Range based for loop
  3. Old style C++ function object
  4. std::for_each with a lambda
  5. std::for_each with std::bind
  6. adobe::for_each with a lambda
  7. adobe::for_each with std::bind

#0 is the version we are trying to get away from, #2 has so much boilerplate that I am prepared to drop it without further consideration. std::for_each and adobe::for_each are variations on a theme, and the choice between them will depend on (a) whether you are prepared to include the Adobe Source Libraries in your project and (b) whether you want to operate over the entire container. That leaves three solutions:

  1. Range based for loop
  2. std::for_each with a lambda
  3. std::for_each with std::bind

Range-based for and lambda both have a block of code that (more or less) matches the block of code in the original raw solution:

{
    f_T_int( t, j );
}

Range-based for lambda differ in the boilerplate surrounding this block of code, although there are still similarities (for example, they both have T const& t declarations).

The main difference is that the range-based for models one type of loop – std::for_each (yes, you can achieve the same effect as other algorithms but it takes additional code). For solving this particular problem that isn’t an issue, that is exactly the type of loop we want. Later in this series we’ll be looking at other algorithms.

The fact that range-based for and lambda both have a block of code means that they can both put extra things into that block of code. In fact, they could put the definition of f_T_int right into the loop block itself. std::bind doesn’t let us do that – once we have bound a particular function the only thing that can be done each time around the loop is to call that function. Let’s look at the pros and cons of the code block vs. std::bind:

Code block pros:

  • Keeps the code together – the loop code and the action to take on each iteration is in the same place.
  • Makes it easy to change the operation that is taking place on each iteration of the loop.
  • Easy to step through in a debugger
  • Clear error messages (at least as clear as C++ error messages ever are)
  • “Normal” syntax – basically the same syntax as if we were calling the function once.

Code block cons:

  • Keeps the code together – the loop code and the action to take on each iteration is in the same place.
  • Too easy to change. Making something easy to change makes it more likely to be changed. Keeping code working as it is maintained is a problem.
  • Difficult to test the body of the loop separately from the loop itself.

std::bind pros:

  • Forces you to split the code into separate functions and manage complexity. The loop is separate from the action taken on each iteration.
  • Easy to test the body of the loop separately from the loop itself.

std::bind cons:

  • Forces you to split the code into separate functions.
  • Tricky to step through in the debugger.
  • Error messages are verbose and almost incomprehensible.

Wrap up

I like std::bind. I think that currying is an elegant solution to many problems, and I like the fact that std::bind forces me to split the looping construct from the action taken on each iteration. I have to come up with a good function name which means I have to think through the problem fully and clearly. Naming is difficult but the payoff of having a well named function is worth it. I like the fact that I can test the function executed for each element independently of the loop.

Sadly (for me anyway), I think I am in a minority of one here. Bjarne Stroustrup writes about bind:

These binders … were heavily used in the past, but most uses seem to be more easily expressed using lambdas.

The C++ Programming Language – Fourth Edition, page 967.

I think that the title of Herb Sutter’s talk Lambdas, Lambdas Everywhere is intended as an expression of optimism rather than as a dire warning.

My biggest concern with lambdas is that they’re going to be used in the same way that function bodies are now – to contain vast amounts of deeply nested code.

I took the title of this series “No raw loops” from a talk by Sean Parent at Going Native 2013. In that same talk (starting at around 29:10) he makes suggestions about when and how to use range-based for loops and lambda functions. The summary is, “keep the body short”, he suggests limiting the body to the composition of two functions with an operator. If I was confident that lambda and range-based for bodies were never going to exceed this level of complexity I would be more positive about them. For myself, having some limits on what I can do often leads to a better organized, more thought through design.

No raw loops 1 – functions

Loop over all the elements in a container and call a function with each element as an argument

In this talk at Going Native 2013 Sean Parent argues that a goal for better code should be “no raw loops”. He talks about a number of problems with raw loops, and a number of ways to address those problems. In this series I am going to look at one way of avoiding raw loops – using STL algorithms.

It seems that we often read advice to “use an STL algorithm”, but there are practical difficulties in doing so. We’re going to start off with the simple cases and move on to the more complex cases in later posts.

We need a few declarations before we start. A class T, a function that we can pass a T to, and a vector of T:

class T
{
   ...
};

void f_T( T const& );

std::vector< T > v_T;

Here’s a typical piece of code to solve the problem “loop over all the elements in a container and call a function with each element as an argument”:

Solution #0

for( 
    std::vector< T >::iterator i( std::begin( v_T ) ); 
    i != std::end( v_T ); 
    ++i )
{
    f_T( *i );
}

There is nothing particularly fancy about this code, it is a straightforward translation of the problem statement into a typical C and C++ form (I know this isn’t C code but the form is familiar to a C programmer). The most esoteric things in it are the iterator and the new std::begin and std::end functions.

I am going to use a technique in this series to enable us to compare different solutions. Given a solution we look at all of the advantages (pros) to that solution and all of the disadvantages (cons). The goal is to list everything that fits in either category – how we choose to weight each pro or con is another matter. It is my contention that you can’t fully evaluate a solution without knowing everything that is wrong with it as well as everything that is right.

Pros:

  • This is idiomatic code. Any C++ programmer should be able to produce this, and it isn’t a million miles away from the idiomatic C code to solve this problem. The translation from the problem statement to the code is simple.
  • The code can easily be extended to do something else with the iterator before calling the function.
  • The loop is easy to step through in the debugger. Stepping into the code that assigns and tests the iterator takes you into the strange world of standard library implementations, but it is easy to step into the function being called.
  • The compiler gives a reasonable error message if the function being called doesn’t handle the type it is being called with. If I try and call f_int (a function that takes an int as a parameter) instead of f_T I get this error message from VS2013:
    d:\documents\projects\c++ idioms\sourceloop_idioms.cpp(328): error C2664: 'void f_int(int)' : cannot convert argument 1 from 'T' to 'int'
    

    and this message from GCC 4.8.2:

    ../../source/loop_idioms.cpp:328:23: error: cannot convert ‘T’ to ‘int’ for argument ‘1’ to ‘void f_int(int)’
             f_int( *i );
    

    both of which point directly to the errant line (328 in loop_idioms.cpp) and identify the problem.

Cons:

  • Although translating from the problem statement to the code is easy, translating from the code back to the problem statement is more difficult. We need to check that the iterator is incremented exactly once around each loop, that it isn’t changed in any other way, and that the loop does not exit early.
  • The code can easily be extended to do something else with the iterator before calling the function. This was in the pros as well – remember that we are trying to solve the original problem statement – “loop over all the elements in a container and call a function with each elementas an argument”: We don’t want anything else inside the loop.
  • There is a lot of boilerplate. We have an additional variable i that wasn’t mentioned in the problem statement and we have a whole bunch of code dedicated to its initialization, testing, type and incrementing.

Let’s look at an alternate solution using the C++11 range-based for loops:

Solution #1

for( T const& t : v_T )
{
    f_T( t );
}

We have certainly reduced the boilerplate. There is less typing. We have removed the iterator i and the need to initialize, increment and test it. We now have t – a reference to an object of type T. We don’t have to specify std::begin and std::end, the loop automatically operates over the whole container. Stepping through the code in the debugger is straightforward and we get the same error messages if we use the wrong function. On the “cons” side, we still have the option of doing more things inside the loop, and there is a still a variable t along with its type. The use of auto would simplify this a little, we would avoid explicitly declaring the type of t (I have issues with auto but I will save those for another post).

Can we do even better? The standard library gives us std::for_each – an algorithm that seems almost perfect for what we want:

Solution #2

std::for_each( std::begin( v_T ), std::end( v_T ), f_T );

This looks nice, we have removed the iterator i which means we have removed its type, initialization, testing and incrementing. There is no need for an explicit variable t and its type (auto or otherwise). There is no way for us to do anything else inside the loop. Other than an exception being thrown the loop is not going to be terminated early (any of our solutions will exit early if an exception is thrown). All of this makes it easy to go from the code back to the problem statement. On the downside, we have to know how std::for_each works. We have moved away from the C-like code of the first solution to something that is firmly C++ and not understandable unless we have knowledge about C++ algorithms (I don’t usually regard “having to know about the standard library” as a Bad Thing but I have worked at many places where the subset of C++ used is small enough that they would dislike it).

Stepping through this in the debugger is harder work. We have to step into the implementation of std::for_each and (with the version of the libraries I have on VS2013) that takes us through a couple more internal functions before we actually call f_T. Putting a breakpoint at the start of f_T is the easiest way to track what is going on each time around the loop.

The compiler error messages also get more confusing. if I replace f_T with f_int as before I get this from GCC 4.8.2:

In file included from /usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/algorithm:62:0,
                 from ../../source/loop_idioms.cpp:6:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/bits/stl_algo.h: In instantiation of ‘_Funct std::for_each(_IIter, _IIter, _Funct) [with _IIter = __gnu_cxx::__normal_iterator<T*, std::vector >; _Funct = void (*)(int)]’:
../../source/loop_idioms.cpp:334:66:   required from here
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/bits/stl_algo.h:4417:14: error: cannot convert ‘T’ to ‘int’ in argument passing
  __f(*__first);
              ^

The basic error message cannot convert ‘T’ to ‘int’ is still present, but it is reported in a standard header (stl_algo.hpp) and we have to look at the rest of the error text to narrow down what line in our source file caused the problem.

There is one more improvement we can make. Remember that the original problem statement said “loop over all the elements in a container …”. Solution #1 (range-based for) let us work over the entire container however solution #2 (std::for) makes us specify the beginning and end of the range explicitly. Is there a way we can get the advantages of std::for_each combined with the the range-based for behavior where we can just specify the container as a whole? It turns out there is, but we have to introduce a new library to do it – the Adobe Source Libraries, available on github. Among other things, ASL introduces a set of extensions to the standard algorithms which allow a range to be expressed as a single argument. If you pass in a container you automatically operate over the full range of that container.

Solution #3

adobe::for_each( v_T, f_T );

This looks difficult to beat. Our original problem statement specified three things:

  1. loop over all the elements
  2. in a container
  3. call a function with each element as an argument

Our final solution contains exactly those three things:

  1. loop over all the elements adobe::for_each
  2. in a container v_T
  3. call a function for each element f_T

The boilerplate is minimal – whitespace and punctuation.

This looks great, but it still has a couple of entries in the “cons” column. It is even worse to step through in the debugger than the std::for_each solution. We have to step through the implementation of adobe::for_each which then calls into std::for_each, and uses std::bind to handle the call. All of these add extra layers of complication to the call stack – using VS2013 I get six stack layers between the adobe::for_each call and f_T.

adobe::for_each involves introducing an extra library into the system. I have worked on some teams who would see this as no problem at all, and others who would fight vehemently against it. Even for teams who are happy to introduce a new library there is a cost in getting legal approval.

Finally, if I substitute the wrong function (f_int rather than f_T), I get an error message that is a classic of C++ template beauty:

In file included from /usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/algorithm:62:0,
                 from ../../source/loop_idioms.cpp:6:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/bits/stl_algo.h: In instantiation of ‘_Funct std::for_each(_IIter, _IIter, _Funct) [with _IIter = __gnu_cxx::__normal_iterator<T*, std::vector >; _Funct = std::_Bind))(int)>]’:
../../adobe/adobe/algorithm/for_each.hpp:43:67:   required from ‘void adobe::for_each(InputIterator, InputIterator, UnaryFunction) [with InputIterator = __gnu_cxx::__normal_iterator<T*, std::vector >; UnaryFunction = void (*)(int)]’
../../adobe/adobe/algorithm/for_each.hpp:53:62:   required from ‘void adobe::for_each(InputRange&, UnaryFunction) [with InputRange = std::vector; UnaryFunction = void (*)(int)]’
../../source/loop_idioms.cpp:347:37:   required from here
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/bits/stl_algo.h:4417:14: error: no match for call to ‘(std::_Bind))(int)>) (T&)’
  __f(*__first);
              ^
In file included from /usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/bits/stl_algo.h:66:0,
                 from /usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/algorithm:62,
                 from ../../source/loop_idioms.cpp:6:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1280:11: note: candidates are:
     class _Bind<_Functor(_Bound_args...)>
           ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1351:2: note: template _Result std::_Bind<_Functor(_Bound_args ...)>::operator()(_Args&& ...) [with _Args = {_Args ...}; _Result = _Result; _Functor = void (*)(int); _Bound_args = {std::_Placeholder}]
  operator()(_Args&&... __args)
  ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1351:2: note:   template argument deduction/substitution failed:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1347:37: error: cannot convert ‘T’ to ‘int’ in argument passing
  = decltype( std::declval<_Functor>()(
                                     ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1365:2: note: template _Result std::_Bind<_Functor(_Bound_args ...)>::operator()(_Args&& ...) const [with _Args = {_Args ...}; _Result = _Result; _Functor = void (*)(int); _Bound_args = {std::_Placeholder}]
  operator()(_Args&&... __args) const
  ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1365:2: note:   template argument deduction/substitution failed:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1361:53: error: cannot convert ‘T’ to ‘int’ in argument passing
          typename add_const<_Functor>::type>::type>()(
                                                     ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1379:2: note: template _Result std::_Bind<_Functor(_Bound_args ...)>::operator()(_Args&& ...) volatile [with _Args = {_Args ...}; _Result = _Result; _Functor = void (*)(int); _Bound_args = {std::_Placeholder}]
  operator()(_Args&&... __args) volatile
  ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1379:2: note:   template argument deduction/substitution failed:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1375:70: error: cannot convert ‘T’ to ‘int’ in argument passing
                        typename add_volatile<_Functor>::type>::type>()(
                                                                      ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1393:2: note: template _Result std::_Bind<_Functor(_Bound_args ...)>::operator()(_Args&& ...) const volatile [with _Args = {_Args ...}; _Result = _Result; _Functor = void (*)(int); _Bound_args = {std::_Placeholder}]
  operator()(_Args&&... __args) const volatile
  ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1393:2: note:   template argument deduction/substitution failed:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1389:64: error: cannot convert ‘T’ to ‘int’ in argument passing
                        typename add_cv<_Functor>::type>::type>()(
                                                                ^

The original error message cannot convert ‘T’ to ‘int’ is present in that block of text – in fact it is present several times. However, tracing back to the line of our code that caused it is non-trivial, although as I get more experience with algorithms I find that I am getting better at interpreting these messages. I am not sure whether this is an achievement that I should be proud of.

Despite the debugger and error message problems I still like and use the adobe::for_each solution (in the language of pros and cons, the pros outweigh the cons for me). It is very easy to reason about (in Sean Parent’s talk he often refers to the ability to reason about code – please watch the talk, it really is very informative and challenging). If I need to step through the loop in the debugger I put a breakpoint into the function being called. As for the error messages, as I said I am getting better at interpreting them, so I just suck it up and deal with them.