Quote of the week – outsmarting the compiler

Trying to outsmart a compiler defeats much of the purpose of using one.

Kernighan and Plauger The Elements of Programming Style.

I have seen many people try to outsmart compilers (sometimes that person was me) and with rare exceptions, the compiler has won every time.

I have seen a programmer who had just read Mike Abrash’s “Zen of Assembly Language” code up their own memcpy routine using all the the clever optimizations in the book, only to discover that the compiler writers know all of these optimizations too. I have seen people try and get clever with switch statements and implement them as jump tables. Steve Johnson’s portable C compiler optimized switch statements automatically in the 80s. I have seen a programmer who was an expert on the 68000 instruction set (and he genuinely was an expert) come unstuck because we were now compiling for the 68040 and the instruction timings had changed.

In the late 90s there was a discussion on one of the games programming newsgroups about using left shift instead of multiply. Rather than this:

j = i * 2;

you’d write:

j = i << 1;

and get a wonderful speed increase. I decided to run some tests on this. I coded up a little test program with two functions:

int multiplyByTwo( int i )
    return i * 2;
int shiftLeftByOne( int i )
    return i << 1;

I ran the program through Watcom with our normal release build settings, then looked at the assembler output. Watcom had replaced i * 2 with i << 1, it had realized that the two functions were the same so it had merged them, then it inlined the function. Furthermore, since I was using constants to keep the program simple it calculated the results at compile time. I think that the only thing left by the end was a call to printf with a couple of constants. Compilers are good at this stuff.

There's another problem - if we are going to try doing these replacements ourselves we have to make sure that we get them right. Replacing multiply with a left shift is easy and safe to do, however, dividing by two is not necessarily the same as shifting right by one - under some circumstances you can safely use the shift, but under others you can't. I have always found that a compiler is far more likely to get these things right than I am.

The moral is for us to write code that expresses exactly what we want (if we want a multiplication then we should write a multiplication) and let the compiler do whatever it feels is appropriate. Compilers are really good at peephole optimizations. As programmers we are much better off using our time to replace that O(n^2) algorithm with an O(n log n) algorithm.

8 thoughts on “Quote of the week – outsmarting the compiler

  1. John Payson says:

    The expression `x >> 4` is the simplest and most readable way of expression the Euclidian quotient of x divided by 16, and likewise `x & 15` is the most readable way of expressing the Euclidian modulus. I would prefer to have operators that would yield the Euclidian quotient and modulus for any given divisor, rather than having to use operators that only work with power-of-two divsors, but unfortunately few language designers provide such a thing (Python is the only one I can think of). Ironically, the one time I can remember in 20+ years of programming where truncated division would have been more useful than Euclidian division was in a piece of time-sensitive code where using the compiler’s division operator invoked a library function and would thus have been too slow.

    Personally, I like the way that Pascal uses a separate keyword for integer division versus floating-point, though it might have been better still if it had used separate keywords for “div”, “mod”, “quot”, and “rem” [with “div” and “mod” using floored division, and “quot” and “rem using truncated]. That would have allowed code wanting each form to request what it actually wants. Otherwise, I know of no way to request Euclidian division by a power-of-two constant which is not considerably clunkier than an arithmetic right shift.

    • Bob says:

      If right shift is the clearest way to express something then I have no beef with it being used. Your example seems like a reasonable use case except for the fact that right shifting a negative number has implementation defined results. Section 5.8.3 of the standard states that for an expression E1 >> E2:

      If E1 has a signed type and a negative value, the resulting value is implementation-defined.

      I remember converting some code from FORTRAN to C and having to deal with the fact that an implementation was free to choose either logical or arithmetic right shift depending on the whims of the system designers. I got the job of seeing what our various compilers and systems did – roughly half one way and half the other. We wrote our own well-documented and well-named functions to make it absolutely clear what was going on.

      If you need to do Euclidian division in a portable manner (or need non power of two divisors) then you’re going to have to roll your own functions I’m afraid. It’s a shame that C++ doesn’t do a better job of handling the different division options.

  2. John Payson says:

    It’s certainly true that code which right-shifts negative numbers will not be portable to any system where doing so yields anything other than an arithmetic right shift. In practice, I doubt that such code is meaningfully less portable than code which assumes that “unsigned char” is 8 bits, assumes it’s safe to perform on literals which are below 32768, computations whose values exceed the range +/-32767 [I don’t think the standard requires -32768], assumes that any two uint32_t values may safely be multiplied without risking Undefined Behavior [multiplication could promote `uint32_t` to a signed type like `int64_t` which is larger than `uint32_t`, but may not be large enough to hold the product], etc. Can you suggest any other clear and efficient way of writing code to compute a Euclidian quotient for a signed dividend and a known power-of-two divisor?

    Personally, I think the C language needs to be split into at least two distinct dialects, one of which would make fewer promises to the programmer, but could be supported by most platforms which are compliant with the present standard, and the other of which would offer more concrete promises to the programmer, some of which certain platforms may be unable to effectively deliver. Allowing for the possibility that a `char` might be 13 bits, an `int` might take the space of four `char` but have only 29 usable bits and 21 padding bits, etc. would make C a practical language for platforms which could not efficiently handle a programming model that relied upon 8-bit bytes, but makes it much harder to write efficient code that will run correctly on all standards-compliant implementations. A good standard should include “wrapping” types (wrap16_t, wrap32_t) which would behave much like unsigned integers do now *but would not be implicitly promoted*. Adding two wrap16_t values or adding an int64_t (or any other size of integer) to a wrap16_t should yield a wrap16_t. Adding a wrap16_t to a wrap32_t should require a cast (since neither type should implicitly promote to the other). Today’s definition of uint32_t and its associated behaviors, which requires some compilers to interpret “1+(uint32_t)-1” as 0u, while requiring others to interpret it as 4294967296, is in many ways the worst of all possible worlds.

    • Bob says:

      Can you suggest any other clear and efficient way of writing code to compute a Euclidian quotient for a signed dividend and a known power-of-two divisor?

      I have nothing I’m afraid. If I had to perform euclidian division I’d first check whether the right shift operator on my platform performed an arithmetic shift. Assuming it did I’d use the right shift and also put in a static assert to ensure that the code wouldn’t compile on any platform that did not use an arithmetic shift. That’s the same thing I would do for any of your other examples. I’d check out what happens on the platform I need and use a static assert to ensure the code will not compile on any platform that does not support the right behaviour.

      If I had to support platforms that did it two different ways (and as I said I have had to do that in the past) I’d wrap the operation inside a function – less convenient to be sure, but getting the wrong answer is also inconvenient.

      • Anon says:

        Can such a check be done at runtime? So if the platform behaves nicely, use the (faster) built-in function, and if not, use a custom routine?

Leave a Reply

Your email address will not be published. Required fields are marked *