Auto

I am deeply suspicious of auto and decltype. I acknowledge that I am on the opposite side of the argument to people such as Bjarne Stroustrup, Herb Sutter and Scott Meyers. This is not a comfortable place to be. I picked these three luminaries because it is easy to find quotes from them – I am sure there are plenty of other people who agree with them.

Unless there is a good reason not to, use auto in small scopes

Bjarne Stroustrup The C++ Programming Language 4th edition, 6.3.6.1

Almost Always Auto

Herb Sutter Almost Always Auto

Prefer auto to explicit type declarations

Scott Meyers Effective Modern C++ (Early release) Item 5

This is not an attempt to trash any of these authors – I have quoted their conclusions however, they have well thought through reasons backing up those conclusions. I highly recommend reading the entire pieces that I took these quotes from – as always with Stroustrup, Sutter and Meyers, even when I disagree with their conclusions (which isn’t that often) I learn something from their arguments. All three authors talk about the problems that can arise from auto – note that even in the quotes I have picked there are phrases like “unless there is a good reason not to”, “almost always” and “prefer”.

I think my main disagreement is with the weighting given to some of the disadvantages. My objection to auto can be summarized as “makes code easier to write, makes it harder to read”, and I weight this very highly, as someone who has spent a fair amount of his career trying to get up to speed with large codebases written by someone else, and trying to get new team members up to speed with large codebases written partially by me.

For example, I joined a team with a large, pre-existing, pre-C++11 codebase that used compiler extensions to fake auto and decltype. Learning the codebase was made far more difficult by the absence of explicitly named types in many functions. The code would not have compiled unless all the types were compatible, that’s good, however I want to know more than that. I want to know what the types are so that I can go and look them up and find out what else they can do for me. I want to look at a function and know what types are the same and what aren’t. Sooner or later I am going to make changes to that function – the changes I can make will be constrained by what the types let me do (and if the types constrain me too much I am going to have to either extend the types or use different types). I am saying “types” over and over again – they’re a big deal.

I want to know what type everything is, and I weight that desire very highly.

There will be IDEs which will make it easy to discover the types being deduced, however as Scott Meyers acknowledges in Item 4 of “Effective Modern C++” they don’t do a perfect job (although I assume that will improve), and not all of us are in a position to use IDEs anyway.

Type deduction is not new in C++ – we’ve been using templates for years, and I have fewer objections to using templates. I am trying to work out why I find templates more acceptable than auto.

Reason #1 has to be that I am more used to templates than I am to auto. It takes me time to get used to a new feature. However, there is more to it than that.

Template types still have a name. Imagine this introduction to a template class:

template< typename ITERATOR >

Within the class we don’t know exactly what the type of ITERATOR is, however the name gives us a strong hint that it should be some sort of iterator.

The fact that the template types have a name also lets us see what relation (if any) there is between types in a piece of code. Contrast this:

auto i( someFunction() );
auto j( someOtherFunction() );

with

ITERATOR i( someFunction() );
ITERATOR j( someOtherFunction() );

The template version makes it clear that i and j are the same type.

On the one hand I dislike the argument “this feature can be misused so we shouldn’t ever use it”, on the other hand I have already seen this feature be massively misused, and I suspect that it will be misused more often than not. Herb Sutter writes:

Remember that preferring auto variables is motivated primarily by correctness, performance, maintainability, and robustness – and only lastly about typing convenience.

Herb Sutter Almost Always Auto

I think this is fine advice, I think that everyone in the world should stick to it. I do not believe that they will. In my experience (and I really hope that other people have different experiences), saving some typing often outweighs everything else. Delayed gratification is an unknown concept in many programming shops.

With more experience I might come around to auto. There are certainly people who have thought about this more deeply than I have who are in favour of auto. There are also definitely places where auto will help in template code. I know I dismissed the “reduces typing” argument earlier, but it would be really nice to avoid have to type typename quite so often.


A little more follow up on the team using the compiler extension versions of auto and decltype. Their use of auto was problematic, it was also exacerbated by a number of other factors. The team loved macros and loved token pasting (##). There were many classes whose declaration and definition were created using macros, and the name of the class was generated using token pasting. Combine that with their use of auto and there were classes whose full name never appeared in the code. They were impossible to grep for and I doubt that there were many IDEs (which that team wasn’t using anyway) that would have helped. I did not last very long on that team.

Quote of the week

I have always found that plans are useless, but planning is indispensable.

Dwight Eisenhower

Eisenhower was making this comment in a military context, and it fits in well with Helmuth von Moltke’s much earlier quote:

No plan of battle ever survives contact with the enemy.

If we remove the quotes from their military context (although I have encountered plenty of people in my career who can legitimately lay claim to being “the enemy”), they talk about the usefulness of the planning process itself. A good project plan is rarely “useless”, but the planning that went into it should have brought many different aspects of the problem into play, and should have led to a far greater understanding of the problem space. This understanding helps when the plan makes contact with “the enemy” and has to be rethought.

I refer back to this post where the answer to the question “what was the most useful information you recorded during those design meetings?” was “the reasons why we didn’t do something”.

The two quotes also show the differences between “traditional” design processes and Agile design processes. Up front planning absolutely has its uses, but the plan inevitably has to change when you start implementing it.

Google’s C++ style guide – follow up

My recent post about Google’s C++ style guide attracted some great comments. Billy O’Neal corrected my ignorance about C++ formatting tools by alerting me to the existence of clang-format; John Payson came up with a way of getting piecemeal from the “no exceptions allowed” state to the “exceptions allowed and all code able to handle them” state, and Titus Winters (a maintainer of the Google style guide) mentioned a talk he gave at CppCon on the subject.

The talk is on YouTube here, and is worth watching in its entirety, but here are a few highlights I took away from it:

1:42 “How should we format our code? BORING QUESTION!”. Titus introduces clang-format.
3:14 What is the purpose of a style guide?
5:15 Google’s publicly available style guide is “probably optimizing for a different set of values than your organization needs”
6:48 Google has an indexer – Kythe – which should be available as open source soon.
9:39 Google plan to be around for at least 10 years. Almost everywhere I have worked has assumed that they are going to be around for 10 years (and several of those places went bust a long time short of 10 years), but very few of my employers have made long term decisions about their code – they might have assumed that they will be around for 10 years but on the code side they certainly didn’t plan for it.
10:55 The philosophies of the style guide.
14:45 A future version of the style guide will allow streams.
28:10 “No non-const references as function arguments” is one of the most controversial rules in the style guide. I am not surprised by this, but I am disappointed. The arguments in favour of no non-const references always seemed good to me.
38:25 “I don’t care about throwing out of memory exceptions”. There are certainly products where an out of memory exception just translates to “die instantly”, however that isn’t universally true. I have never worked on a project that was able to recover and continue after running out of memory but I have worked on products that have wanted to know about the out of memory case so they can save the data the user is working on. It’s just another reminder that guidelines depend on the products they are being used for.
57:20 Probably the best answer to any formatting question that I have heard.
Q: What about tabs vs. spaces? What’s Google’s take on that?
A: I don’t care … I just use clang-format

The theme of the day is clang-format. I spent some time over the weekend playing around with clang-format and it is excellent. I haven’t worked out how to get it to do quite everything I want, however the consistency benefits far outweigh any minor deviations from my preferred style. Also, if it bothers me that much I have the source code and can add more options.

On a related note, this is the first time I have been able to get Clang and LLVM (mostly) compiling on Windows without spending a ridiculous amount of time on it. Windows has long been the afterthought platform for LLVM and Clang – it looks like it’s getting better.

Google’s C++ Style Guide

I read through Google’s C++ style guide recently and found some interesting recommendations. I am trying to avoid saying “right” or “wrong” – there are plenty of things in a style guide that are matters of opinion or are tailored to the needs of a specific company – so I am focusing on things I find interesting.

The direct link to the guide is http://google-styleguide.googlecode.com/svn/trunk/cppguide.html.

The guide is stored in a subversion repository which can be accessed at
http://google-styleguide.googlecode.com/svn

I am looking at revision 138 in the repository, checked in on 8 September 2014. At the very bottom of the document there’s a version number – 4.45.

I have no insider knowledge apart from one conversation with a Google engineer who told me that the “no exceptions” rule was the most controversial part of the guide.

It is remarkable that Google have a company-wide C++ style guide. I have worked in smaller companies (16 people, roughly half of whom were engineers) where we could not agree on anything code related. In my career, style guides (aka coding guidelines or coding standards) have created more divisive arguments than almost anything else, and enforcing them is the quickest way to upset everyone.

I do like the fact that many of the entries include pros, cons and a discussion of why a particular solution was chosen. As I have said before on this blog, even if you believe that you have made the right choice you should know what the drawbacks of that choice are.

The guide also talks a lot about consistency. I like consistency. There are times where the choices are almost arbitrary, however, it is much nicer not to have the code randomly select different choices.

Let’s start with the most controversial guideline.


Exceptions

We do not use C++ exceptions.

At least the guideline is clear. The writers of the guide also admit:

Things would probably be different if we had to do it all over again from scratch.

Taking away exceptions removes a very powerful and useful way of handling errors. It reduces the need to have error returns everywhere (which are rarely checked). Using exceptions does require your code to be written in an exception-safe manner, however that exception safe manner often has a bunch of other benefits – RAII helps with early returns and means you can’t forget the “clean up” on one code path. Smart pointers are a great way to manage lifetime (and smart pointers are encouraged by the style guide). I have worked at a number of places that banned exceptions for a variety of reasons, some good, some bad. It is possible to write decent C++ code without exceptions, but their absence does make life harder.

I have two major problems though:

  1. According to the given reasoning, there will never be a time when exceptions are allowed in Google’s code. The guideline states:

     
    Because most existing C++ code at Google is not prepared to deal with exceptions, it is comparatively difficult to adopt new code that generates exceptions.

    If the guideline is followed there will be even more code at Google that does not deal with exceptions making them even less likely to be allowed in the future. There is no way to get from here (no exceptions) to there (exceptions allowed). It is one thing to ban a feature because its current use causes problems, but for there to be no path to unbanning the feature is short-sighted in the extreme.

    You can’t even state that all new code must be exception safe. If you want code to be exception safe it has to be fully tested in the presence of exceptions. That would involve an additional build target with different settings, different versions of libraries and some additional specific tests for exception handling. That build is never going to exist.

  2. The knock on effects are horrible, in particular the effect on constructors.

Doing Work in Constructors

Avoid doing complex initialization in constructors.

First of all, the prohibition on calling virtual functions in constructors is perfectly reasonable. It’s one of those things that fall into the “don’t do that” category. The few times I have thought that I needed to do it were caused by errors in my design.

Global variables with complex constructors also fall into the “don’t do that” category, or at least “don’t do that unless you really know what you’re doing”. C++ is a harsh language if you don’t know what you’re doing. It’s a harsh language even if you do know what you’re doing.

Those are the points where I agree with Google, let’s look at the disagreements.

One of the joys of good constructors is that once the constructor has finished running you have a fully initialized object in a well-defined state that is ready to be used. That’s a simple rule and it works in many different creation scenarios. We can create the object on the stack:

SomeClass someObject( arg0, arg1, arg2 );

We can create it on the heap:

SomeClass* pSomeObject( new SomeClass( arg0, arg1, arg2 ) );

We can create the object as a member variable of another class:

SomeOtherClass::SomeOtherClass( some arguments )
: someObject_( arg0, arg1, arg2 )
{
}

We can create a temporary object for passing to a function:

someFunction( SomeClass( arg0, arg1, arg2 ) );

Two stage construction makes all of these examples longer, more fragile and less maintainable. Creating a temporary object is basically not possible – you have to create a named object then call init before passing it to a function (yes we can wrap construction and init inside a create function but why should we have to?)

The style guide states:

If the work fails, we now have an object whose initialization code failed, so it may be an [sic] indeterminate state.

With Google’s suggested “solution” (a separate init function) once the constructor has finished you have an object in an indeterminate (or at least not-fully-initialized) state. You then have to call the init function and you have to check any error return value from the init function (Note – have to. The call and the error check are not optional and you’d better not forget them or allow any future modifications to the code to accidentally delete them, or even move them too far from the original construction). Initializing const members is far more problematic with two stage construction.

In my career I have rarely had a problem with errors in constructors, although Google are certainly dealing with problems of scale that I have never encountered. Even so, I believe that this rule is hurting Google, not helping them.


Names and Order of Includes

Having praised Google for their use of pros and cons they let me down here. Google’s guideline is to start with the related header (most specific), then to jump to the least specific headers (C and C++ library includes), then gradually move to more specific headers.

The only rationale I have seen for a particular include order chooses the opposite order to Google – start with most specific, then move to less specific so that you finish with the C and C++ library headers. The rationale is to make sure that your own headers include all of the standard system files that they require – if you order the includes the other way it is easier to hide incorrect includes (or rather, lack of includes). Also, if for some reason you decided to #define linear_congruential( i ) ( i * 2 ) (or define any other symbol in the standard libraries), having the standard libraries last will almost certainly produce an error. Of course you shouldn’t be using macros in the first place, but for some inexplicable reason, people do.

In practice, I haven’t had many problems that are related to the order of includes (and when I have the problem has nearly always been that an include file wasn’t including all of the headers it required).

I do like their suggestion for alphabetical order within each section. I once worked with a developer who was diligent enough to maintain all the functions in his .cpp files in alphabetical order. I know that broke a bunch of guidelines about keeping related functions together, but damn, it was easy to find a function.


Function Names

Regular functions have mixed case; accessors and mutators match the name of the variable: MyExcitingFunction(), MyExcitingMethod(), my_exciting_member_variable(), set_my_exciting_member_variable().

I said at the beginning that I was going to try to avoid using words like “wrong”, but my tolerance has run out. This guideline is wrong.

How is it wrong? Let me count the ways:

  1. It breaks encapsulation. The name of the member function tells you something about the internal implementation.
  2. If the internal implementation changes so that the value is no longer stored in a member variable are you going to change the accessor / mutation functions and everywhere in the code that calls them? (The answer to that question is almost certainly “no”)
  3. Assume two classes exist, both of which have a concept of some number of files that they know about. One class stores the number of files in a member variable (so will have a function called num_files()), the other class calculates the number of files when necessary (so has a function called NumFiles()). You now have a template class that needs to be instantiated with a class that can tell you how many files it knows about. You cannot write a single template class that will work with both num_files() and NumFiles().
  4. In practice I am sure that if any of the first three points cause problems the name of the accessor will be changed one way or the other. There is a deeper problem though. If you’re thinking about the class in terms of accessors and mutators (or getters and setters) you are fundamentally thinking about the design incorrectly.

    Allen Holub’s provocatively named piece “Why getter and setter methods are evil” (and if anyone is supposed to not be evil it’s Google) is well worth reading. He makes some points that are beyond the scope of this post (“Don’t ask for the information you need to do the work; ask the object that has the information to do the work for you” has some interesting consequences that I am not wild about) , but he makes a killer point about “adding unnecessary capabilities to the classes”. He talks in terms of the capabilities of a class – he does that in other places as well (http://www.holub.com/goodies/rules.html), and in his book “Holub on Patterns: Learning Design Patterns by Looking at Code” where he states:

    An object is a bundle of capabilities … an object is defined by what it can do, not how it does it.

    To go back to our example, is it reasonable for our file handling class to be able to tell you how many files it knows about? If the answer to that question is “yes”, then it is quite reasonable to add a NumFiles() method. One of the capabilities of the class is that it can answer the question “how many files do you know about”. If, for some reason, (and this is a made up example which means I don’t have a good reason) it is reasonable for the class to be told that the number of files has changed, it is reasonable to have a NumFiles( int ) method. The class has the capability to accept a notification that the number of files has changed. As a result of having this information this task can perform its functions correctly. None of this says anything about the implementation of those capabilities.

In practice, I am sure that if #2 or #3 turned out to be problems the guideline will be ignored and the same name for the accessor function would be used in both places. Even if the guideline were changed to preserve encapsulation, #4 will probably still be ignored. Most development shops I have worked in do not think about class design in terms of capabilities. Google are no worse than everyone else, on the other hand, it would be nice if they were better.

For more on the subject of being evil, see the C++ FAQ.


Casting

Use C++ casts like static_cast<>(). Do not use other cast formats like int y = (int)x; or int y = int(x);

I am in complete agreement. I just want to add that in the “cons” section they state:

The syntax is nasty

Yes. The syntax is nasty. That was by design. To quote Bjarne Stroustrup from his FAQ:

An ugly operation should have an ugly syntactic form. That observation was part of the reason for choosing the syntax for the new-style casts. A further reason was for the new-style casts to match the template notation, so that programmers can write their own casts, especially run-time checked casts.

Maybe, because static_cast is so ugly and so relatively hard to type, you’re more likely to think twice before using one? That would be good, because casts really are mostly avoidable in modern C++.


Inline Functions

I am disappointed that Google don’t suggest separating inline declarations from inline definitions. C++ requires separate declarations because it was required to work with the C compilation model (C# for example does not require separate declarations). The C++ model involves repeating information – in general that’s a Bad Thing, however having declarations in a separate header file allows a class declaration to serve as effective documentation for the class and its capabilities (yes, I am ignoring the fact that the protected and private parts of the class are also visible in the declaration – it is far from perfect). Putting an inline (or template) function directly in the declaration clutters up that list of public member functions. You can place the definition later in the header file to get inlining (or template definitions) but still have an uncluttered declaration.


Operator Overloading

Do not overload operators except in rare, special circumstances. Do not create user-defined literals.

I don’t have enough experience yet with user defined literals to comment on whether they are a Good Thing or a Bad Thing, but I am prepared to put operator overloading firmly into the Good Thing category. Yes, operator overloading can be misused, but so can everything in C++. If we’re going to start banning things because of the potential for misuse we’re not going to end up with much of a language.

Overloaded operators are more playful names for functions that are less-colorfully named, such as Equals() or Add().

Oww. That’s the way to be condescending towards overloaded operators. For “playful” I would substitute “clear, obvious, concise and well defined”. When you overload an operator you cannot change the associativity, precedence or number of arguments that it takes. Who knows how many additional (but defaulted) parameters there are in that Equals function?

In particular, do not overload operator== or operator< just so that your class can be used as a key in an STL container; instead, you should create equality and comparison functor types when declaring the container.

As far as I am concerned, being used as a key or being searched for is a great reason to overload operator == and operator <. My usual practice is that operator == compares every independent value in the class and operator < gives an ordering that also depends on every independent value. If I need something different (for example an equality test that just compares the names of two objects) I’ll write a separate function.


Auto

Google likes auto. I still have significant reservations about it. At some point I will write up my concerns, but they can be summarized as “auto makes code easier to write but harder to read”.


Formatting

Coding style and formatting are pretty arbitrary

In many ways formatting is arbitrary, but I have always liked the formatting to be consistent. Even when the formatting seems arbitrary, there are still smart people who can find reasons for preferring one format to another. For example, in Object-Oriented Software Construction, Bertrand Meyer has the following to say about spaces:

The general rule, for simplicity and ease of remembering, is to follow as closely as possible the practice of standard written language. By default we will assume this language to be English, although it may be appropriate to adapt the conventions to the slightly different rules of other languages.

Here are some of the consequences. You will use a space:

  • Before an opening parenthesis, but not after:
    f (x) (not f(x), the C style, or f( x))

The argument isn’t a slam dunk “this is obviously the only sensible way to lay out parenthesis and if it isn’t followed the world will fall apart”, on the other hand it is an argument with an actual reason attached (as opposed to “I prefer it this way”).

Steve McConnell’s Code Complete 2nd edition has lots to say about formatting, backed up by actual reasons. For example, sections 31.3 and 31.4 come up with an interesting way of looking at block and brace placement and indentation. Unfortunately the brace placement that I favour for my personal projects fares badly under McConnell’s arguments.

Google’s preferred function declaration and definition syntax looks like this:

ReturnType ClassName::ReallyLongFunctionName(Type par_name1, Type par_name2,
                                             Type par_name3) {
  DoSomething();
  ...
}

In section 31.7 McConnell points out that this approach does not hold up well under modification – if the length of ReturnType, ClassName or ReallyLongFunctionName change then you have to re-indent the following lines. As McConnell says:

styles that are hard to maintain aren’t maintained

Ever since I started programming I have always heard of the idea of having a tool that automatically formats code to the house standard when it is checked in, and formats it to the individual developer’s standard when it is checked out. I think this is a fabulous idea but I have never come across any development shop that is actually using it – I would love somebody to point me to a working example. Of course the situation is exacerbated in C++ because formatting the language is extremely difficult. I tried a number of formatting tools a few years ago and they all had significant problems so I haven’t investigated them since. I hear good things about astyle – I will take a look at it some time.

I am jealous of Go. It has a standard format and a tool (gofmt) to lay code out in that format.


In conclusion, there are many parts of the guide I agree with (I haven’t mentioned most of them because that doesn’t fit my definition of “interesting”), but there are some major places where it looks like Google are hampering themselves. I said that just having a company-wide style guide was an achievement, but that style guide really needs to be working for them, not against them.

To finish on a positive note, I particularly like this statement at the end of the style guide:

The point of having style guidelines is to have a common vocabulary of coding so people can concentrate on what you are saying, rather than on how you are saying it.

Hear hear.

Quote of the week – outsmarting the compiler

Trying to outsmart a compiler defeats much of the purpose of using one.

Kernighan and Plauger The Elements of Programming Style.

I have seen many people try to outsmart compilers (sometimes that person was me) and with rare exceptions, the compiler has won every time.

I have seen a programmer who had just read Mike Abrash’s “Zen of Assembly Language” code up their own memcpy routine using all the the clever optimizations in the book, only to discover that the compiler writers know all of these optimizations too. I have seen people try and get clever with switch statements and implement them as jump tables. Steve Johnson’s portable C compiler optimized switch statements automatically in the 80s. I have seen a programmer who was an expert on the 68000 instruction set (and he genuinely was an expert) come unstuck because we were now compiling for the 68040 and the instruction timings had changed.

In the late 90s there was a discussion on one of the games programming newsgroups about using left shift instead of multiply. Rather than this:

j = i * 2;

you’d write:

j = i << 1;

and get a wonderful speed increase. I decided to run some tests on this. I coded up a little test program with two functions:

int multiplyByTwo( int i )
{
    return i * 2;
}
int shiftLeftByOne( int i )
{
    return i << 1;
}

I ran the program through Watcom with our normal release build settings, then looked at the assembler output. Watcom had replaced i * 2 with i << 1, it had realized that the two functions were the same so it had merged them, then it inlined the function. Furthermore, since I was using constants to keep the program simple it calculated the results at compile time. I think that the only thing left by the end was a call to printf with a couple of constants. Compilers are good at this stuff.

There's another problem - if we are going to try doing these replacements ourselves we have to make sure that we get them right. Replacing multiply with a left shift is easy and safe to do, however, dividing by two is not necessarily the same as shifting right by one - under some circumstances you can safely use the shift, but under others you can't. I have always found that a compiler is far more likely to get these things right than I am.

The moral is for us to write code that expresses exactly what we want (if we want a multiplication then we should write a multiplication) and let the compiler do whatever it feels is appropriate. Compilers are really good at peephole optimizations. As programmers we are much better off using our time to replace that O(n^2) algorithm with an O(n log n) algorithm.

Include guards

In 1996 John Lakos published “Large-Scale C++ Software Design”, a book containing the lessons learned from a large scale C++ project developed at Mentor Graphics.

I want to make it clear that this post is not meant as a criticism of John Lakos or his book. First, I don’t have a copy of the book in front of me so I am working from memory – I might have things wrong. Second, John Lakos was describing his experience at a particular time and place. We are at a different time and place. Among other things, compilers, hard disks, processors, memory and networks have changed enormously over the last 20 years.

One of the lessons Lakos documents involves include guards. A common idiom is for the contents of every header file to be surrounded by include guards:

myheader.hpp

#ifndef MYHEADER_HPP
#define MYHEADER_HPP

// Contents of include file go here

#endif

Even if the file is #included multiple times it will only be processed once. Since the include guard is entirely internal to the header file this is known as an “internal include guard”. The problem is that a naive compiler will still have to open and read the file every time it is included in a translation unit. If opening a file is slow (perhaps because it is being opened over a network) there is a performance hit even though the contents of the file are not being processed. As a solution to this problem, Lakos suggested using external include guards as well as internal include guards:

myheader2.hpp

#ifndef MYHEADER_HPP
#include "myheader.hpp"
#endif

The external guards ensure that once myheader.hpp has been processed it doesn’t even have to be opened again. Keep the internal include guards to ensure accuracy but use the external include guards to improve compile performance.

It turns out that some compilers are way ahead of us – they implement the “include guard optimization”. When the compiler parses a header file it looks to see if the include guard idiom is in place. If it is, the compiler notes this fact and then does the equivalent of applying its own external include guards if the file is included a second time. In 1999 a discussion arose on comp.lang.c++.moderated about external include guards vs. internal include guards vs. the include guard optimization (I haven’t been able to track down the original discussion I’m afraid. Google’s Usenet search has me baffled). I decided to run some tests and put up a webpage with my results. Some other C++ programmers ran tests on their systems that allowed me to get results from compilers and systems that I don’t have access to.

The main page is here, if you want to skip straight to the results go here.

There are a few patterns. GCC has had the optimization for a long time, VC++ has never had the optimization (although the results for the latest VC++ compiler are strange – something odd is going on). Watcom had the optimization in the 1990s (incidentally, Watcom lives on as Open Watcom, although it looks like there hasn’t been a release for 4 years).

There are a couple of important points to note. All these tests are doing is testing the include times of a simple header – the header itself is very small so there is almost no time spent processing it. The tests are being run with tens if not hundreds of thousands of includes of the same file – a situation I have never encountered in practice

A few years ago I was on a team that was upset about its compilation times and wanted to look for speedups. Tests showed that the single greatest contributor to our build time was <windows.h>. The solution to that was in 2 parts – include <windows.h> in as few places as possible (and in particular try to keep it out of header files), and use precompiled headers. The include guards didn’t come close to being a problem.

As always, nothing takes the place of running tests on your own codebase, although I would start by measuring the processing time for several commonly used headers – that is far more likely to be the problem than include guards.

Unfortunately making sure we only include the necessary headers is time consuming to do, and even more time consuming to maintain. I have never come up with the perfect solution to this problem – at times I have used imperfect solutions but they have all been inconvenient and required serious amounts of run time.

Finally, John Lakos has a new book coming out in 2015 – Large-Scale C++ Volume I: Process and Architecture. It’ll be interesting to see what he finds to focus on in our current computing environment.