Quote of the week – judgement

A superior pilot uses his superior judgment to avoid situations which require the use of his superior skill.

Frank Borman – Apollo 8

I once had a programmer tell me that he had to use a goto because he needed to jump out of 10 levels of nesting (in C++ code). In the most literal sense he was correct – a return or an exception would not have worked in this situation. Of course he should never have got into 10 levels of nesting in the first place.

Sometimes the best solution to a problem is to back out and change the system so that the problem just doesn’t exist.

(For the avoidance of doubt, I am not claiming that using a goto qualifies as “superior skill”, but it is an extreme way of getting out of a problem. Also, I am not (entirely) anti-goto. If I ever come across a situation where a goto is the best way to achieve something in C++ I will use it. I have just never encountered such a situation since I switched from C to C++ 19 years ago.)

Quote of the week – array indices

Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration.

Stan Kelly-Bootle

There is a fascinating piece by Edsger W. Dijkstra “Why numbering should start at zero” that discuss the various options available for describing ranges and explains why we should prefer one option over the others. Dijkstra is one of a number of programmers who have the ability to look at choices that seem obvious or arbitrary and come up with reasons for preferring one of the choices.

No raw loops 1 – functions

Loop over all the elements in a container and call a function with each element as an argument

In this talk at Going Native 2013 Sean Parent argues that a goal for better code should be “no raw loops”. He talks about a number of problems with raw loops, and a number of ways to address those problems. In this series I am going to look at one way of avoiding raw loops – using STL algorithms.

It seems that we often read advice to “use an STL algorithm”, but there are practical difficulties in doing so. We’re going to start off with the simple cases and move on to the more complex cases in later posts.

We need a few declarations before we start. A class T, a function that we can pass a T to, and a vector of T:

class T
{
   ...
};

void f_T( T const& );

std::vector< T > v_T;

Here’s a typical piece of code to solve the problem “loop over all the elements in a container and call a function with each element as an argument”:

Solution #0

for( 
    std::vector< T >::iterator i( std::begin( v_T ) ); 
    i != std::end( v_T ); 
    ++i )
{
    f_T( *i );
}

There is nothing particularly fancy about this code, it is a straightforward translation of the problem statement into a typical C and C++ form (I know this isn’t C code but the form is familiar to a C programmer). The most esoteric things in it are the iterator and the new std::begin and std::end functions.

I am going to use a technique in this series to enable us to compare different solutions. Given a solution we look at all of the advantages (pros) to that solution and all of the disadvantages (cons). The goal is to list everything that fits in either category – how we choose to weight each pro or con is another matter. It is my contention that you can’t fully evaluate a solution without knowing everything that is wrong with it as well as everything that is right.

Pros:

  • This is idiomatic code. Any C++ programmer should be able to produce this, and it isn’t a million miles away from the idiomatic C code to solve this problem. The translation from the problem statement to the code is simple.
  • The code can easily be extended to do something else with the iterator before calling the function.
  • The loop is easy to step through in the debugger. Stepping into the code that assigns and tests the iterator takes you into the strange world of standard library implementations, but it is easy to step into the function being called.
  • The compiler gives a reasonable error message if the function being called doesn’t handle the type it is being called with. If I try and call f_int (a function that takes an int as a parameter) instead of f_T I get this error message from VS2013:
    d:\documents\projects\c++ idioms\sourceloop_idioms.cpp(328): error C2664: 'void f_int(int)' : cannot convert argument 1 from 'T' to 'int'
    

    and this message from GCC 4.8.2:

    ../../source/loop_idioms.cpp:328:23: error: cannot convert ‘T’ to ‘int’ for argument ‘1’ to ‘void f_int(int)’
             f_int( *i );
    

    both of which point directly to the errant line (328 in loop_idioms.cpp) and identify the problem.

Cons:

  • Although translating from the problem statement to the code is easy, translating from the code back to the problem statement is more difficult. We need to check that the iterator is incremented exactly once around each loop, that it isn’t changed in any other way, and that the loop does not exit early.
  • The code can easily be extended to do something else with the iterator before calling the function. This was in the pros as well – remember that we are trying to solve the original problem statement – “loop over all the elements in a container and call a function with each elementas an argument”: We don’t want anything else inside the loop.
  • There is a lot of boilerplate. We have an additional variable i that wasn’t mentioned in the problem statement and we have a whole bunch of code dedicated to its initialization, testing, type and incrementing.

Let’s look at an alternate solution using the C++11 range-based for loops:

Solution #1

for( T const& t : v_T )
{
    f_T( t );
}

We have certainly reduced the boilerplate. There is less typing. We have removed the iterator i and the need to initialize, increment and test it. We now have t – a reference to an object of type T. We don’t have to specify std::begin and std::end, the loop automatically operates over the whole container. Stepping through the code in the debugger is straightforward and we get the same error messages if we use the wrong function. On the “cons” side, we still have the option of doing more things inside the loop, and there is a still a variable t along with its type. The use of auto would simplify this a little, we would avoid explicitly declaring the type of t (I have issues with auto but I will save those for another post).

Can we do even better? The standard library gives us std::for_each – an algorithm that seems almost perfect for what we want:

Solution #2

std::for_each( std::begin( v_T ), std::end( v_T ), f_T );

This looks nice, we have removed the iterator i which means we have removed its type, initialization, testing and incrementing. There is no need for an explicit variable t and its type (auto or otherwise). There is no way for us to do anything else inside the loop. Other than an exception being thrown the loop is not going to be terminated early (any of our solutions will exit early if an exception is thrown). All of this makes it easy to go from the code back to the problem statement. On the downside, we have to know how std::for_each works. We have moved away from the C-like code of the first solution to something that is firmly C++ and not understandable unless we have knowledge about C++ algorithms (I don’t usually regard “having to know about the standard library” as a Bad Thing but I have worked at many places where the subset of C++ used is small enough that they would dislike it).

Stepping through this in the debugger is harder work. We have to step into the implementation of std::for_each and (with the version of the libraries I have on VS2013) that takes us through a couple more internal functions before we actually call f_T. Putting a breakpoint at the start of f_T is the easiest way to track what is going on each time around the loop.

The compiler error messages also get more confusing. if I replace f_T with f_int as before I get this from GCC 4.8.2:

In file included from /usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/algorithm:62:0,
                 from ../../source/loop_idioms.cpp:6:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/bits/stl_algo.h: In instantiation of ‘_Funct std::for_each(_IIter, _IIter, _Funct) [with _IIter = __gnu_cxx::__normal_iterator<T*, std::vector >; _Funct = void (*)(int)]’:
../../source/loop_idioms.cpp:334:66:   required from here
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/bits/stl_algo.h:4417:14: error: cannot convert ‘T’ to ‘int’ in argument passing
  __f(*__first);
              ^

The basic error message cannot convert ‘T’ to ‘int’ is still present, but it is reported in a standard header (stl_algo.hpp) and we have to look at the rest of the error text to narrow down what line in our source file caused the problem.

There is one more improvement we can make. Remember that the original problem statement said “loop over all the elements in a container …”. Solution #1 (range-based for) let us work over the entire container however solution #2 (std::for) makes us specify the beginning and end of the range explicitly. Is there a way we can get the advantages of std::for_each combined with the the range-based for behavior where we can just specify the container as a whole? It turns out there is, but we have to introduce a new library to do it – the Adobe Source Libraries, available on github. Among other things, ASL introduces a set of extensions to the standard algorithms which allow a range to be expressed as a single argument. If you pass in a container you automatically operate over the full range of that container.

Solution #3

adobe::for_each( v_T, f_T );

This looks difficult to beat. Our original problem statement specified three things:

  1. loop over all the elements
  2. in a container
  3. call a function with each element as an argument

Our final solution contains exactly those three things:

  1. loop over all the elements adobe::for_each
  2. in a container v_T
  3. call a function for each element f_T

The boilerplate is minimal – whitespace and punctuation.

This looks great, but it still has a couple of entries in the “cons” column. It is even worse to step through in the debugger than the std::for_each solution. We have to step through the implementation of adobe::for_each which then calls into std::for_each, and uses std::bind to handle the call. All of these add extra layers of complication to the call stack – using VS2013 I get six stack layers between the adobe::for_each call and f_T.

adobe::for_each involves introducing an extra library into the system. I have worked on some teams who would see this as no problem at all, and others who would fight vehemently against it. Even for teams who are happy to introduce a new library there is a cost in getting legal approval.

Finally, if I substitute the wrong function (f_int rather than f_T), I get an error message that is a classic of C++ template beauty:

In file included from /usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/algorithm:62:0,
                 from ../../source/loop_idioms.cpp:6:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/bits/stl_algo.h: In instantiation of ‘_Funct std::for_each(_IIter, _IIter, _Funct) [with _IIter = __gnu_cxx::__normal_iterator<T*, std::vector >; _Funct = std::_Bind))(int)>]’:
../../adobe/adobe/algorithm/for_each.hpp:43:67:   required from ‘void adobe::for_each(InputIterator, InputIterator, UnaryFunction) [with InputIterator = __gnu_cxx::__normal_iterator<T*, std::vector >; UnaryFunction = void (*)(int)]’
../../adobe/adobe/algorithm/for_each.hpp:53:62:   required from ‘void adobe::for_each(InputRange&, UnaryFunction) [with InputRange = std::vector; UnaryFunction = void (*)(int)]’
../../source/loop_idioms.cpp:347:37:   required from here
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/bits/stl_algo.h:4417:14: error: no match for call to ‘(std::_Bind))(int)>) (T&)’
  __f(*__first);
              ^
In file included from /usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/bits/stl_algo.h:66:0,
                 from /usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/algorithm:62,
                 from ../../source/loop_idioms.cpp:6:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1280:11: note: candidates are:
     class _Bind<_Functor(_Bound_args...)>
           ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1351:2: note: template _Result std::_Bind<_Functor(_Bound_args ...)>::operator()(_Args&& ...) [with _Args = {_Args ...}; _Result = _Result; _Functor = void (*)(int); _Bound_args = {std::_Placeholder}]
  operator()(_Args&&... __args)
  ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1351:2: note:   template argument deduction/substitution failed:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1347:37: error: cannot convert ‘T’ to ‘int’ in argument passing
  = decltype( std::declval<_Functor>()(
                                     ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1365:2: note: template _Result std::_Bind<_Functor(_Bound_args ...)>::operator()(_Args&& ...) const [with _Args = {_Args ...}; _Result = _Result; _Functor = void (*)(int); _Bound_args = {std::_Placeholder}]
  operator()(_Args&&... __args) const
  ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1365:2: note:   template argument deduction/substitution failed:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1361:53: error: cannot convert ‘T’ to ‘int’ in argument passing
          typename add_const<_Functor>::type>::type>()(
                                                     ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1379:2: note: template _Result std::_Bind<_Functor(_Bound_args ...)>::operator()(_Args&& ...) volatile [with _Args = {_Args ...}; _Result = _Result; _Functor = void (*)(int); _Bound_args = {std::_Placeholder}]
  operator()(_Args&&... __args) volatile
  ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1379:2: note:   template argument deduction/substitution failed:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1375:70: error: cannot convert ‘T’ to ‘int’ in argument passing
                        typename add_volatile<_Functor>::type>::type>()(
                                                                      ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1393:2: note: template _Result std::_Bind<_Functor(_Bound_args ...)>::operator()(_Args&& ...) const volatile [with _Args = {_Args ...}; _Result = _Result; _Functor = void (*)(int); _Bound_args = {std::_Placeholder}]
  operator()(_Args&&... __args) const volatile
  ^
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1393:2: note:   template argument deduction/substitution failed:
/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/include/c++/functional:1389:64: error: cannot convert ‘T’ to ‘int’ in argument passing
                        typename add_cv<_Functor>::type>::type>()(
                                                                ^

The original error message cannot convert ‘T’ to ‘int’ is present in that block of text – in fact it is present several times. However, tracing back to the line of our code that caused it is non-trivial, although as I get more experience with algorithms I find that I am getting better at interpreting these messages. I am not sure whether this is an achievement that I should be proud of.

Despite the debugger and error message problems I still like and use the adobe::for_each solution (in the language of pros and cons, the pros outweigh the cons for me). It is very easy to reason about (in Sean Parent’s talk he often refers to the ability to reason about code – please watch the talk, it really is very informative and challenging). If I need to step through the loop in the debugger I put a breakpoint into the function being called. As for the error messages, as I said I am getting better at interpreting them, so I just suck it up and deal with them.

Quote of the week – simplicity

There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.

C.A.R. Hoare

The problem with simplicity is that it can be awfully complicated to get there. I have often found that when thinking about a problem I start with a solution that is simple, but naïve (and wrong). As I continue working through the problem I get cleverer and cleverer but more and more complicated. If I stick with it for long enough I continue to get cleverer and cleverer but get less and less complicated until (if the programming gods allow) I end up with a clever, but simple, solution that is also correct. I need to remember to have the confidence and patience to get over the hill to the simple solution.