I read through Google’s C++ style guide recently and found some interesting recommendations. I am trying to avoid saying “right” or “wrong” – there are plenty of things in a style guide that are matters of opinion or are tailored to the needs of a specific company – so I am focusing on things I find interesting.
The direct link to the guide is http://google-styleguide.googlecode.com/svn/trunk/cppguide.html.
The guide is stored in a subversion repository which can be accessed at
http://google-styleguide.googlecode.com/svn
I am looking at revision 138 in the repository, checked in on 8 September 2014. At the very bottom of the document there’s a version number – 4.45.
I have no insider knowledge apart from one conversation with a Google engineer who told me that the “no exceptions” rule was the most controversial part of the guide.
It is remarkable that Google have a company-wide C++ style guide. I have worked in smaller companies (16 people, roughly half of whom were engineers) where we could not agree on anything code related. In my career, style guides (aka coding guidelines or coding standards) have created more divisive arguments than almost anything else, and enforcing them is the quickest way to upset everyone.
I do like the fact that many of the entries include pros, cons and a discussion of why a particular solution was chosen. As I have said before on this blog, even if you believe that you have made the right choice you should know what the drawbacks of that choice are.
The guide also talks a lot about consistency. I like consistency. There are times where the choices are almost arbitrary, however, it is much nicer not to have the code randomly select different choices.
Let’s start with the most controversial guideline.
Exceptions
We do not use C++ exceptions.
At least the guideline is clear. The writers of the guide also admit:
Things would probably be different if we had to do it all over again from scratch.
Taking away exceptions removes a very powerful and useful way of handling errors. It reduces the need to have error returns everywhere (which are rarely checked). Using exceptions does require your code to be written in an exception-safe manner, however that exception safe manner often has a bunch of other benefits – RAII helps with early returns and means you can’t forget the “clean up” on one code path. Smart pointers are a great way to manage lifetime (and smart pointers are encouraged by the style guide). I have worked at a number of places that banned exceptions for a variety of reasons, some good, some bad. It is possible to write decent C++ code without exceptions, but their absence does make life harder.
I have two major problems though:
- According to the given reasoning, there will never be a time when exceptions are allowed in Google’s code. The guideline states:
Because most existing C++ code at Google is not prepared to deal with exceptions, it is comparatively difficult to adopt new code that generates exceptions.If the guideline is followed there will be even more code at Google that does not deal with exceptions making them even less likely to be allowed in the future. There is no way to get from here (no exceptions) to there (exceptions allowed). It is one thing to ban a feature because its current use causes problems, but for there to be no path to unbanning the feature is short-sighted in the extreme.
You can’t even state that all new code must be exception safe. If you want code to be exception safe it has to be fully tested in the presence of exceptions. That would involve an additional build target with different settings, different versions of libraries and some additional specific tests for exception handling. That build is never going to exist.
- The knock on effects are horrible, in particular the effect on constructors.
Doing Work in Constructors
Avoid doing complex initialization in constructors.
First of all, the prohibition on calling virtual functions in constructors is perfectly reasonable. It’s one of those things that fall into the “don’t do that” category. The few times I have thought that I needed to do it were caused by errors in my design.
Global variables with complex constructors also fall into the “don’t do that” category, or at least “don’t do that unless you really know what you’re doing”. C++ is a harsh language if you don’t know what you’re doing. It’s a harsh language even if you do know what you’re doing.
Those are the points where I agree with Google, let’s look at the disagreements.
One of the joys of good constructors is that once the constructor has finished running you have a fully initialized object in a well-defined state that is ready to be used. That’s a simple rule and it works in many different creation scenarios. We can create the object on the stack:
SomeClass someObject( arg0, arg1, arg2 );
We can create it on the heap:
SomeClass* pSomeObject( new SomeClass( arg0, arg1, arg2 ) );
We can create the object as a member variable of another class:
SomeOtherClass::SomeOtherClass( some arguments ) : someObject_( arg0, arg1, arg2 ) { }
We can create a temporary object for passing to a function:
someFunction( SomeClass( arg0, arg1, arg2 ) );
Two stage construction makes all of these examples longer, more fragile and less maintainable. Creating a temporary object is basically not possible – you have to create a named object then call init
before passing it to a function (yes we can wrap construction and init
inside a create function but why should we have to?)
The style guide states:
If the work fails, we now have an object whose initialization code failed, so it may be an [sic] indeterminate state.
With Google’s suggested “solution” (a separate init function) once the constructor has finished you have an object in an indeterminate (or at least not-fully-initialized) state. You then have to call the init function and you have to check any error return value from the init function (Note – have to. The call and the error check are not optional and you’d better not forget them or allow any future modifications to the code to accidentally delete them, or even move them too far from the original construction). Initializing const members is far more problematic with two stage construction.
In my career I have rarely had a problem with errors in constructors, although Google are certainly dealing with problems of scale that I have never encountered. Even so, I believe that this rule is hurting Google, not helping them.
Names and Order of Includes
Having praised Google for their use of pros and cons they let me down here. Google’s guideline is to start with the related header (most specific), then to jump to the least specific headers (C and C++ library includes), then gradually move to more specific headers.
The only rationale I have seen for a particular include order chooses the opposite order to Google – start with most specific, then move to less specific so that you finish with the C and C++ library headers. The rationale is to make sure that your own headers include all of the standard system files that they require – if you order the includes the other way it is easier to hide incorrect includes (or rather, lack of includes). Also, if for some reason you decided to #define linear_congruential( i ) ( i * 2 )
(or define any other symbol in the standard libraries), having the standard libraries last will almost certainly produce an error. Of course you shouldn’t be using macros in the first place, but for some inexplicable reason, people do.
In practice, I haven’t had many problems that are related to the order of includes (and when I have the problem has nearly always been that an include file wasn’t including all of the headers it required).
I do like their suggestion for alphabetical order within each section. I once worked with a developer who was diligent enough to maintain all the functions in his .cpp files in alphabetical order. I know that broke a bunch of guidelines about keeping related functions together, but damn, it was easy to find a function.
Function Names
Regular functions have mixed case; accessors and mutators match the name of the variable:
MyExcitingFunction()
,MyExcitingMethod()
,my_exciting_member_variable()
,set_my_exciting_member_variable()
.
I said at the beginning that I was going to try to avoid using words like “wrong”, but my tolerance has run out. This guideline is wrong.
How is it wrong? Let me count the ways:
- It breaks encapsulation. The name of the member function tells you something about the internal implementation.
- If the internal implementation changes so that the value is no longer stored in a member variable are you going to change the accessor / mutation functions and everywhere in the code that calls them? (The answer to that question is almost certainly “no”)
- Assume two classes exist, both of which have a concept of some number of files that they know about. One class stores the number of files in a member variable (so will have a function called
num_files()
), the other class calculates the number of files when necessary (so has a function calledNumFiles()
). You now have a template class that needs to be instantiated with a class that can tell you how many files it knows about. You cannot write a single template class that will work with bothnum_files()
andNumFiles()
. -
In practice I am sure that if any of the first three points cause problems the name of the accessor will be changed one way or the other. There is a deeper problem though. If you’re thinking about the class in terms of accessors and mutators (or getters and setters) you are fundamentally thinking about the design incorrectly.
Allen Holub’s provocatively named piece “Why getter and setter methods are evil” (and if anyone is supposed to not be evil it’s Google) is well worth reading. He makes some points that are beyond the scope of this post (“Don’t ask for the information you need to do the work; ask the object that has the information to do the work for you” has some interesting consequences that I am not wild about) , but he makes a killer point about “adding unnecessary capabilities to the classes”. He talks in terms of the capabilities of a class – he does that in other places as well (http://www.holub.com/goodies/rules.html), and in his book “Holub on Patterns: Learning Design Patterns by Looking at Code” where he states:
An object is a bundle of capabilities … an object is defined by what it can do, not how it does it.
To go back to our example, is it reasonable for our file handling class to be able to tell you how many files it knows about? If the answer to that question is “yes”, then it is quite reasonable to add a
NumFiles()
method. One of the capabilities of the class is that it can answer the question “how many files do you know about”. If, for some reason, (and this is a made up example which means I don’t have a good reason) it is reasonable for the class to be told that the number of files has changed, it is reasonable to have aNumFiles( int )
method. The class has the capability to accept a notification that the number of files has changed. As a result of having this information this task can perform its functions correctly. None of this says anything about the implementation of those capabilities.
In practice, I am sure that if #2 or #3 turned out to be problems the guideline will be ignored and the same name for the accessor function would be used in both places. Even if the guideline were changed to preserve encapsulation, #4 will probably still be ignored. Most development shops I have worked in do not think about class design in terms of capabilities. Google are no worse than everyone else, on the other hand, it would be nice if they were better.
For more on the subject of being evil, see the C++ FAQ.
Casting
Use C++ casts like
static_cast<>()
. Do not use other cast formats likeint y = (int)x;
orint y = int(x);
I am in complete agreement. I just want to add that in the “cons” section they state:
The syntax is nasty
Yes. The syntax is nasty. That was by design. To quote Bjarne Stroustrup from his FAQ:
An ugly operation should have an ugly syntactic form. That observation was part of the reason for choosing the syntax for the new-style casts. A further reason was for the new-style casts to match the template notation, so that programmers can write their own casts, especially run-time checked casts.
Maybe, because static_cast is so ugly and so relatively hard to type, you’re more likely to think twice before using one? That would be good, because casts really are mostly avoidable in modern C++.
Inline Functions
I am disappointed that Google don’t suggest separating inline declarations from inline definitions. C++ requires separate declarations because it was required to work with the C compilation model (C# for example does not require separate declarations). The C++ model involves repeating information – in general that’s a Bad Thing, however having declarations in a separate header file allows a class declaration to serve as effective documentation for the class and its capabilities (yes, I am ignoring the fact that the protected and private parts of the class are also visible in the declaration – it is far from perfect). Putting an inline (or template) function directly in the declaration clutters up that list of public member functions. You can place the definition later in the header file to get inlining (or template definitions) but still have an uncluttered declaration.
Operator Overloading
Do not overload operators except in rare, special circumstances. Do not create user-defined literals.
I don’t have enough experience yet with user defined literals to comment on whether they are a Good Thing or a Bad Thing, but I am prepared to put operator overloading firmly into the Good Thing category. Yes, operator overloading can be misused, but so can everything in C++. If we’re going to start banning things because of the potential for misuse we’re not going to end up with much of a language.
Overloaded operators are more playful names for functions that are less-colorfully named, such as
Equals()
orAdd()
.
Oww. That’s the way to be condescending towards overloaded operators. For “playful” I would substitute “clear, obvious, concise and well defined”. When you overload an operator you cannot change the associativity, precedence or number of arguments that it takes. Who knows how many additional (but defaulted) parameters there are in that Equals
function?
In particular, do not overload
operator==
oroperator<
just so that your class can be used as a key in an STL container; instead, you should create equality and comparison functor types when declaring the container.
As far as I am concerned, being used as a key or being searched for is a great reason to overload operator ==
and operator <
. My usual practice is that operator ==
compares every independent value in the class and operator <
gives an ordering that also depends on every independent value. If I need something different (for example an equality test that just compares the names of two objects) I’ll write a separate function.
Auto
Google likes auto. I still have significant reservations about it. At some point I will write up my concerns, but they can be summarized as “auto makes code easier to write but harder to read”.
Formatting
Coding style and formatting are pretty arbitrary
In many ways formatting is arbitrary, but I have always liked the formatting to be consistent. Even when the formatting seems arbitrary, there are still smart people who can find reasons for preferring one format to another. For example, in Object-Oriented Software Construction, Bertrand Meyer has the following to say about spaces:
The general rule, for simplicity and ease of remembering, is to follow as closely as possible the practice of standard written language. By default we will assume this language to be English, although it may be appropriate to adapt the conventions to the slightly different rules of other languages.
Here are some of the consequences. You will use a space:
- Before an opening parenthesis, but not after:
f (x)
(notf(x)
, the C style, orf( x)
)
The argument isn’t a slam dunk “this is obviously the only sensible way to lay out parenthesis and if it isn’t followed the world will fall apart”, on the other hand it is an argument with an actual reason attached (as opposed to “I prefer it this way”).
Steve McConnell’s Code Complete 2nd edition has lots to say about formatting, backed up by actual reasons. For example, sections 31.3 and 31.4 come up with an interesting way of looking at block and brace placement and indentation. Unfortunately the brace placement that I favour for my personal projects fares badly under McConnell’s arguments.
Google’s preferred function declaration and definition syntax looks like this:
ReturnType ClassName::ReallyLongFunctionName(Type par_name1, Type par_name2, Type par_name3) { DoSomething(); ... }
In section 31.7 McConnell points out that this approach does not hold up well under modification – if the length of ReturnType
, ClassName
or ReallyLongFunctionName
change then you have to re-indent the following lines. As McConnell says:
styles that are hard to maintain aren’t maintained
Ever since I started programming I have always heard of the idea of having a tool that automatically formats code to the house standard when it is checked in, and formats it to the individual developer’s standard when it is checked out. I think this is a fabulous idea but I have never come across any development shop that is actually using it – I would love somebody to point me to a working example. Of course the situation is exacerbated in C++ because formatting the language is extremely difficult. I tried a number of formatting tools a few years ago and they all had significant problems so I haven’t investigated them since. I hear good things about astyle – I will take a look at it some time.
I am jealous of Go. It has a standard format and a tool (gofmt) to lay code out in that format.
In conclusion, there are many parts of the guide I agree with (I haven’t mentioned most of them because that doesn’t fit my definition of “interesting”), but there are some major places where it looks like Google are hampering themselves. I said that just having a company-wide style guide was an achievement, but that style guide really needs to be working for them, not against them.
To finish on a positive note, I particularly like this statement at the end of the style guide:
The point of having style guidelines is to have a common vocabulary of coding so people can concentrate on what you are saying, rather than on how you are saying it.
Hear hear.
It should be noted that Google built a tool to maintain the style they use, clang-format, so the “styles that are hard to maintain aren’t maintained” is less of an issue when maintaining the style is a simple keystroke.
I would think that the proper way to introduce exceptions would be to divide methods into three categories: those which neither throw exceptions nor are prepared to accept exceptions thrown by methods they call, methods which throw exceptions and can handle exceptions from nested methods, and methods which can handle exceptions from inner methods but do not propagate them. It should be possible to write a tool which would then statically verify that no method which might throw exceptions could be called by code which would be unprepared to handle them. Such a design would allow for an evolution toward wider usage of exceptions as methods could be moved from the first category to the third, and methods whose callers were all in the third category could be moved to the second.
With regard to constructors, I wish languages and frameworks would allow a base class to specify a virtual method which would be invoked after the most derived constructor finished. Constructors would be expected to refrain from invoking any virtual methods, exposing the object under construction to the outside world, or doing anything which might be expected to fail, leaving such actions for the aforementioned virtual method. Absent such a language feature, I would think it might be good to specify that objects which would need to do any of those things in a constructor should refrain from defining public constructors but instead define static factory methods. Not as nice syntactically on the client side, but cleaner semantically than requiring customers to perform two-step initialization.
Operator overloading I have mixed feelings about. I’m not terribly familiar with how overloading is handled in C++, but at least in C# operator overloading can often cause trouble because overloads are statically evaluated in ways that can be surprising, especially in the presence of implicit type conversions. Since even built-in types can have trouble with that, extending it to user-defined types would likely be a recipe for more confusion. For example, in C#, 16777216.0f==16777217 will return true because the second operand gets rounded to `float` before the conversion.
I don’t like the idea of having code reformatted on check-in; it would seem better to have the check-in process require that the code be formatted to match the convention, and have an easy means by which a programmer can make code conform to the standard *and view everything that was changed in the process* without checking it in. If code conformed to the standard before the programmer started working with it, it shouldn’t take long for the programmer to confirm that reformatting didn’t have any unintended consequences, but I don’t like the idea of having code that is supposed to be human-readable get checked in without anyone having looked at it.
PS–I wish there was a usable standard for adding source-code markup in a fashion which would be invisible to the compiler. Sometimes it’s useful to have source code contain columnar data. Using spaces is icky and hard to maintain, and using tabs is even worse, but if there were a means of markup which could say “in the following bit of code, the first tab on each line but the first should line up with the location of the first line’s first tab that follows something other than whitespace; likewise with the second, third, fourth, etc.
I love the idea of having tools to check for things like exception safety – that would make it possible to get there from here. It seems to me that there is a big need for additional tools in C++ looking for things that can’t (or shouldn’t) be checked by the compiler.
I write python scripts to check things about my code – my current project funnels all memory allocation through a custom allocator that uses the TBB allocation methods underneath. I have a script that goes through the code and checks that I am not using regular new and delete. It looks at every standard container I am using and confirms that it has been instantiated with the correct allocator. With any luck clang will be the base of an ecosystem that allows us to write more complex tools.
Operator overloading definitely does have its issues, and can be abused, but I have had so many benefits from it that I am loath to ban it.
I’m one of the maintainers of that doc. Bits of it are (always) in flux. If you want some deeper insight, I gave a talk on this at CppCon recently (it’s on youtube as of a few days ago).
Thanks Titus. The link to his talk is http://www.youtube.com/watch?v=NOCElcMcFik. There are also a bunch of other talks from CppCon on YouTube.
I particularly like the fact that Titus starts off talking about the purpose of a style guide, and points out that “rules have costs”.