There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.
I have seen a number of variations on this quote, and it looks as though the original quote only included two of the three hard things. Martin Fowler quotes Phil Karlton here and there is more discussion of the quote here.
I don’t know whether the original quote concerned hardware caching or software caching. Since I’m not a hardware engineer, I’ll talk about software caching. When implementing software caches I have found it useful to apply the Single Responsibility Principle and divide things up along these lines:
- The thing that calculates (fetches, obtains, looks up etc.) the value we want.
- The thing that actually stores the cached value.
- The thing that decides whether the value should be cached or not.
- The thing that decides whether the cache (or a given cache entry) should be invalidated.
I wish I were better at naming things. Naming is important, good naming can show that similar things are connected and dissimilar things are separate. Good names can show patterns in the code that are otherwise difficult to see. When we’re trying to get acquainted with a new codebase, names can make make our lives heaven or hell.
It isn’t always easy to work out what “similar” means. In this comment on my tbb::concurrent_vector post, Adrian points out that std::vector guarantees contiguous storage whereas tbb::concurrent_vector doesn’t. What are the essential properties of a vector that must be maintained by anything calling itself a vector? The lack of contiguous storage in tbb::concurrent_vector hadn’t occurred to me as being noteworthy – hence I wrote a blog post that was confusing. Having a name that makes sense to you is one thing, having someone else look at it helps identify any assumptions you’ve made that you didn’t know you were making.
There is a particular joy in coming up with a good name. On the odd occasions I experience this joy it is usually because I have forgotten (in the sense of rote learning) the name of something but I’ve been able to reconstruct what the name is. At the very least it shows some consistency – my brain is working the same way now as it was when I originally came up with the name. Sometimes the quality of a name doesn’t come to light until we try and name something that’s similar – if the original name is good, we can easily derive a name for the new, similar, thing.
I am sure that off by one errors will always be with us. The half open ranges used by many languages (including C and C++) make the situation better, but not perfect. My first job was programming in FORTRAN 77 where arrays start at index 1. I had to write more +1 and -1 statements in FORTRAN than I have ever had to in C and C++. Even better was when we were calling between C and FORTRAN. Yes, we knew the array indices were off by one, but it wasn’t always easy to remember whether it was +1 or -1.
I have mentioned this piece by Dijkstra before. As well as his theoretical analysis, he also mentions that the programming language Mesa allowed for all four different range conventions, and that the “half-open, start at zero” convention led to fewer errors.
8 thoughts on “Quote of the week – hard things”
I like the approach I first read in Inside Macintosh discussing graphics: coordinates do not identify pixels but rather the divisions between them. Thus, if the left side of a rectangle is at 4 and right right side at 6, that means that it encompasses the pixels between X coordinates 4 and 5, and the ones between X coordinates 5 and 6. Thus, two columns of pixels. Things that operate on a single pixel specify the coordinate of its upper-left corner, but most things operate on ranges.
The same approach can work when discussing half-open ranges of things like array slots or memory addresses. Just mentioning a location by itself refers to the slot between that location and the next, so SomeArray refers to the same element as SomeArray[4..5], and SomeArray[4..6] refers to the element SomeArray[4..5] and SomeArray[5..6].
Getting the right representation for pixels coordinates and rectangles is an interesting problem. We used half open ranges for PixelBender to describe areas. It then makes sense to make the coordinates of the centre of the pixel have an offset of one-half – working in a positive X direction we had (0.5, 0.5), (1.5, 0.5), (2.5, 0.5) …
Shades of Stan Kelly-Bootle’s quote about array indices.
About Off-By-One errors: Delphi uses 1-based strings (yes…. lots of +1 / -1 statements). But the best part is, that they now managed to add helper methods (string.IndexOf etc) that are 0-based!
So myString[myString.IndexOf(‘X’)] is either OutOfBounds (first character was an X) or the character just before the “X”.
This is unbelievable (and yet, sadly, all too believable). Some group of people sat in a room and decided that inconsistent array indexing was exactly the thing that their language needed. This industry scares the hell out of me.
It sounds like two groups of people sat in two rooms and independently decided what the best method would be — and let their conflict play out through the users