Inconsistent sorting

A few years ago I was working on a prototype for some 3D drawing code. I was using the painters algorithm – draw the objects from far to near so that the close objects will occlude the distant objects. I had my objects in a vector, I sorted them according to their distance from the camera, then drew the objects. Then I noticed something odd. When I first started the program I had two objects that would flicker. They would swap which one was in front – sometimes object A would be drawn in front of object B, sometimes B was drawn in front of A.

My start up coordinates for the camera and the objects meant that A and B were exactly the same distance from the camera, so when I did the sort, A and B would compare equal. Since I wasn’t using std::stable_sort I wasn’t expecting any particular ordering for A and B, but I was expecting the ordering to be consistent between calls to std::sort. After some debugging I discovered that giving the exact same input to std::sort would not always give the same output – objects with identical distances might be sorted differently on each call.

I ended up digging into the standard library implementation. I think I was using Metrowerks C++, but I wouldn’t bet on it. The implementation of std::sort used Quicksort – nothing unusual there. One of the problems with Quicksort though is that you have to make good choices of pivot element – the element that you use to to split the input into two partitions. For example, if you use the first element as the pivot and hand Quicksort an already sorted sequence the runtime ends up being O(n^2). Quicksort has great average case behaviour but bad worst case behaviour. There are various techniques for picking the pivot element including:

  • first element
  • last element
  • middle element
  • median of the first, last and middle elements

This particular implementation had found a different method of picking the pivot – it used rand. Effective since the chances of providing the sort with a worst case sequence were low, but it led to this odd behaviour – the same input to std::sort would not necessarily lead to the same output. I switched to std::stable_sort and all my problems went away.

As far as I can tell from the standard (section 25.4.1.1), this behaviour for std::sort is entirely legal. std::sort does not guarantee the order in which equivalent elements are placed, and I don’t see anything that forces std::sort to be consistent. I have never found this implementation in any other standard library, but I have heard of one person who was caught out like I was.

2 thoughts on “Inconsistent sorting

  1. Adrian says:

    I have heard of selecting the pivot element randomly, but I assume that most implementations choose the median of {first, middle, last} because it works well enough and, more importantly, it is presumably faster to determine the median of the three than it is to call the pseudo-random number generator.

    In your particular problem, it seems stable_sort is the wise choice anyway. You wouldn’t want a change to (or addition of) some unrelated object C in your vector to affect the ordering of objects A and B even if the random pivot selection were forbidden by the specs.

    The random implementation has advantages though. In Writing Solid Code, Steve Maguire talks about making rare things happen frequently to shake out the bugs that could otherwise remain latent for a while. For example, realloc can move memory, but in many circumstances it doesn’t. As a result, programmers often forget to account for the possibility. He suggests wrapping realloc in a function that always moves the memory, making it virtually impossible for the caller overlook the possibility. By this reasoning, an unstable sort that produces valid but different sequences on each call could be a valuable tool, at least in debug builds.

  2. John Payson says:

    Using a “random” pivot has the advantage of guarding against deliberately-constructed “evil” permutations. A median-of-three approach could me made to take orders of magnitude longer than it should if it was given a million-item list whose three largest elements were the first, middle, and last; whose next three largest elements would be the first, middle, and last among those that would remain after the first partitioning step, etc. That having been said, I’m not sure it’s legitimate to have a system sort routine call `rand`, since the sequence of values returned following an `srand` is supposed to be consistent and I don’t think routines like `sort` are supposed to affect it. A better design might be to have `sort` include its own pseudo-random generator, which could be designed to mesh well with the requirements of the sorting routine [speed is more important than “randomness”, provided only that it avoids any patterns that could lead to signifcantly-worse-than-NlgN behavior].

Leave a Reply

Your email address will not be published. Required fields are marked *