No raw loops 2 – arguments | Random programming

Loop over all the elements in a container and call a function with each element and additional values as arguments

In part 1 of this series we looked at the simplest case we could, a case that is tailor-made for std::for_each – calling a function with each element of a container as an argument. That particular case is easy, but it obviously doesn’t cover everything so we’ll move on to a (slightly) more complex example.

Our previous problem statement was: “loop over all the elements in a container and call a function with each element as an argument”: We’ll extend the problem so that now the function doesn’t just take a single argument of the same type as the element, it takes an additional argument.

As before we have a few declarations:

class T
{
   ...
};

void f_T_int( T const& t, int i )

std::vector< T > v_T;

Notice that f_T_int takes an object of type T and an int.

First of all, our raw loop version of the solution.

Solution #0

int j( functionReturningInt() );

for( std::vector< T >::iterator i( std::begin( v_T ) ); 
    i != std::end( v_T ); 
    ++i )
{
    f_T_int( *i, j );
}

Solution #0 has the same set of pros and cons as solution #0 did in part 1.

As before, there is a range-based for solution:

Solution #1

for( T const& t : v_T )
{
    f_T_int( t, j );
}

Again, similar pros and cons to the range-based for in part 1.

How about using for_each? In part 1 , turning the explicit loop into an application of std::for_each was easy. std::for_each expects to be given a function (and I’ll include “function object” in the definition of function) that takes a single argument. In part 1 that was exactly the function we wished to call. Now, however, we want to call a function that takes an additional parameter – as well as taking an object of type T it takes an integer.

Back in the bad old days before C++11 we would set up a function object like this:

Solution #2

class C_T_int
{
public:
    C_T_int( int j ) : j_( j )
    {
    }

    void operator ()( T const& t )
    {
        f_T_int( t, j_ );
    }
private:
    int j_;
};

std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    C_T_int( j ) );

The function object C_T_int takes our additional argument (the integer) in its constructor, then defines operator () to accept the argument from our container. When operator () is called it has both arguments that it needs to be able to call f_T_int.

The function object allows us to use std::for_each, and the line of code that calls std::for_each is very straightforward, however solution #1 has a massive “con”: the overhead is horrible, there is lots of boilerplate.

If only we had some way of generating the function object without having to write all of the boilerplate. The C++11 standard provides us with exactly that: lamda functions. As the standard says “Lambda expressions provide a concise way to create simple function objects”. Using lambdas leads to:

Solution #3

std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    [ j ]( T const& t ) 
{ 
    f_T_int( t, j ); 
} );

(This is a minimal introduction to lambda functions, there is much more to them than I am going to cover. Herb Sutter’s talk Lambdas, Lambdas Everywhere has much more information.)

The lambda function consists of three parts, contained in three different types of brackets. It starts with square brackets – [] – moves on to round brackets – () – then finishes off with curly brackets – {}.

The three parts correspond to the three essential features of the function object we created for solution #2:

The value(s) passed into the constructor of the function object appear in the square brackets – []
The value(s) passed into operator () appear in the round brackets – ()
The code in operator () appears in the curly brackets – {}

Notice the unusual looking piece of syntax at the end:

} );

That occurs because the lambda is being passed as an argument to std::for_each.

This first four solutions have one thing in common, they each contain a block of code looking something like this:

{
    f_T_int( t, j );
}

The difference is in the surrounding boilerplate. Assuming that we ignore solution #2, the hand-coded function object (which is sufficiently horrible that I feel justified in dismissing it), all of these solutions keep the code to be executed each time during the loop in the same place as the loop – they keep everything together (although if you take that to the extreme then we end up with one enormous “main” function – it’s another entry in both the “pros”” and “cons” column). The range-based for and std::for_each solutions separate out the loop from what happens each time around the loop. Looking at the code and seeing a range-based for or std::for_each tells us that exceptions and early returns aside, we will iterate over every member of the given range.

The block of code can easily be extended to do something else. As before, this falls into both “pros” and “cons”.

C++11 gives us another option, a way of taking a function with N arguments and turning it into a function with M arguments (where M < N). For those of you who think that this sounds like currying, you’re right – it’s like currying.

Solution #4

using namespace std::placeholders;
std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    std::bind( f_T_int, _1, j ) );

std::bind started life as boost::bind and was adopted into the C++11 standard. In solution #4 it takes three arguments:

f_T_int – The function we ultimately want to call.
_1 – The first argument to be passed to f_T_int. _1 is a special value, a placeholder. std::bind is going to produce a function object – in this case a function object that takes a single argument. That single argument (the first argument) will be passed on to f_T_int in the position that _1 is in.
j – The second argument to be passed to f_T_int. In this case it’s the value of the variable j.

In this example, std::bind took a function that takes two parameters – f_T_int and turned it into a function object that takes a single parameter by binding one of the arguments to j. This single parameter function is exactly what we want for std::for_each.

(As with the lambdas, I am skipping over many, many details about std::bind.)

(Aside – I am not normally a fan of using statements, and in particular I won’t use them much in this series because I want to be very clear about where everything is coming from, however if I were to keep this rule up I would have to write std::placeholders::_1 and std::placeholders::_2 and that is too ugly even for me.)

We can also use adobe::for_each with lambda or std::bind:

Solution #5

adobe::for_each( v_T, [ j ]( T const& t ) 
{ 
    f_T_int( t, j ); 
} );

Solution #6

adobe::for_each( v_T, std::bind( f_T_int, _1, j ) );

So we have our original raw loop solution (#0) and 6 other possibilities:

The raw loop – we are trying to get away from this.
Range based for loop
Old style C++ function object
std::for_each with a lambda
std::for_each with std::bind
adobe::for_each with a lambda
adobe::for_each with std::bind

#0 is the version we are trying to get away from, #2 has so much boilerplate that I am prepared to drop it without further consideration. std::for_each and adobe::for_each are variations on a theme, and the choice between them will depend on (a) whether you are prepared to include the Adobe Source Libraries in your project and (b) whether you want to operate over the entire container. That leaves three solutions:

Range based for loop
std::for_each with a lambda
std::for_each with std::bind

Range-based for and lambda both have a block of code that (more or less) matches the block of code in the original raw solution:

{
    f_T_int( t, j );
}

Range-based for lambda differ in the boilerplate surrounding this block of code, although there are still similarities (for example, they both have T const& t declarations).

The main difference is that the range-based for models one type of loop – std::for_each (yes, you can achieve the same effect as other algorithms but it takes additional code). For solving this particular problem that isn’t an issue, that is exactly the type of loop we want. Later in this series we’ll be looking at other algorithms.

The fact that range-based for and lambda both have a block of code means that they can both put extra things into that block of code. In fact, they could put the definition of f_T_int right into the loop block itself. std::bind doesn’t let us do that – once we have bound a particular function the only thing that can be done each time around the loop is to call that function. Let’s look at the pros and cons of the code block vs. std::bind:

Code block pros:

Keeps the code together – the loop code and the action to take on each iteration is in the same place.
Makes it easy to change the operation that is taking place on each iteration of the loop.
Easy to step through in a debugger
Clear error messages (at least as clear as C++ error messages ever are)
“Normal” syntax – basically the same syntax as if we were calling the function once.

Code block cons:

Keeps the code together – the loop code and the action to take on each iteration is in the same place.
Too easy to change. Making something easy to change makes it more likely to be changed. Keeping code working as it is maintained is a problem.
Difficult to test the body of the loop separately from the loop itself.

std::bind pros:

Forces you to split the code into separate functions and manage complexity. The loop is separate from the action taken on each iteration.
Easy to test the body of the loop separately from the loop itself.

std::bind cons:

Forces you to split the code into separate functions.
Tricky to step through in the debugger.
Error messages are verbose and almost incomprehensible.

Wrap up

I like std::bind. I think that currying is an elegant solution to many problems, and I like the fact that std::bind forces me to split the looping construct from the action taken on each iteration. I have to come up with a good function name which means I have to think through the problem fully and clearly. Naming is difficult but the payoff of having a well named function is worth it. I like the fact that I can test the function executed for each element independently of the loop.

Sadly (for me anyway), I think I am in a minority of one here. Bjarne Stroustrup writes about bind:

These binders … were heavily used in the past, but most uses seem to be more easily expressed using lambdas.

The C++ Programming Language – Fourth Edition, page 967.

I think that the title of Herb Sutter’s talk Lambdas, Lambdas Everywhere is intended as an expression of optimism rather than as a dire warning.

My biggest concern with lambdas is that they’re going to be used in the same way that function bodies are now – to contain vast amounts of deeply nested code.

I took the title of this series “No raw loops” from a talk by Sean Parent at Going Native 2013. In that same talk (starting at around 29:10) he makes suggestions about when and how to use range-based for loops and lambda functions. The summary is, “keep the body short”, he suggests limiting the body to the composition of two functions with an operator. If I was confident that lambda and range-based for bodies were never going to exceed this level of complexity I would be more positive about them. For myself, having some limits on what I can do often leads to a better organized, more thought through design.

4 thoughts on “No raw loops 2 – arguments”

Michael says:

I really like your logical breakdown of pros and cons.

Your statement of “Too easy to change” is interesting. To me if there’s a good reason to change something, then having made it hard to change is just to have made things more difficult than they need to be; while on the other hand if you don’t have a good reason to change something then whether or not it’s easy to change is irrelevant because you should choose not to change it. I’d be interested to hear more of your reasoning. Is it because other people might change it, or because you might change it accidentally “without good reason”? Could you give any examples of where such a change would be harmful but still chosen by whoever is making the change?


April 28, 2014 at 7:14 pm
- Bob says:
  
  In hindsight, “too easy to change” does seem like an odd thing to say about software, and I agree with your point in this post that we don’t think about software change enough. Let me try and articulate my point more clearly.
  
  My comment about “too easy to change” was made about the “code block” method of writing a loop. We might use a range-based for, or an algorithm with a lambda but we still have a block of code right there as part of the looping structure. That block of code is anonymous (that’s one of the big selling points of lambdas) and can access anything in the surrounding function (you have to use the capture list for a lambda but it is easy to tell it to capture everything). If we use a function with std::bind the function has a name and has an explicit set of arguments it can act upon (I am assuming that we don’t run riot with globals).
  
  It is easy to change anonymous code that performs task A into anonymous code that performs task B – nothing needs to change other than the code. Changing a function that performs task A into a function that performs task B involves (or at least should involve) changing the name of the function to reflect the different task, and might involve changing the parameters of the function. It feels to me like a more deliberate change. I think that the named function vs. anonymous code block is the core of my argument. I know that naming is hard (“there are two hard problems in computer science – caching, naming and off by one errors”), but on the occasions I have been on a team who have really thought through the naming issues it has been thoroughly worth it.
  
  A lot of this comes from my observation that code tends to grow. We might start off with a nice, small, easy to understand block of code inside a loop, but it has a tendency to grow over time, and while I know in theory that code maintenance and refactoring should solve that problem, too often in my career it hasn’t. Writing a piece of code that works right now is one thing, writing a piece of code that is likely to be robust under modification is another.
  
  
  April 29, 2014 at 6:28 pm
Pingback: No raw loops 3 – member functions | Random programming
Pingback: std::bind and lambda functions 5 | Random programming

Random programming

Random programming

Bob Archer's thoughts about programming

No raw loops 2 – arguments

4 thoughts on “No raw loops 2 – arguments”

Leave a Reply to Michael Cancel reply