Algorithms in action – std::transform

Nice and simple this week, a short example with one of my most used algorithms – std::transform. This post talked about the way in which I am sorting a list of files. At the time I wrote that post I was getting the list of files directly from the file system, however I also want to be able to write the file metadata out to a JSON file and read it back in.

A simplified version of the JSON file looks like this:

{
    "compile.bat" : [77, 990630480],
    "keycodes.h" : [2728, 1124154608],
    "q3_ui.bat" : [5742, 1033844878],
    "q3_ui.q3asm" : [544, 1033844878],
    "q3_ui.sh" : [1124, 990630480],
    "q3_ui.vcproj" : [63532, 1124230338],
    "ui.def" : [29, 990630480],
    "ui.q3asm" : [568, 990630480]
}

(The full version includes a directory structure as well. I have removed the directory structure for simplicity.)

As before, I have my FileMetaData structure and a vector of FileMetaData objects:

struct FileMetaData
{
    std::string fileName_;
    std::size_t size_;
    std::time_t lastWriteTime_;
};

typedef std::vector< FileMetaData > FileMetaDataVector;

I want to read in the file which gives me the data as a JSON string, and transform it into a FileMetaDataVector object. For simplicity, I am using the boost property tree classes as a quick and easy way to get JSON input working (this will probably not be my final solution but it is good enough for now).

Reading the JSON data in and parsing it as JSON is easy:

FileMetaDataVector
getFileMetaData(
    std::istream& istr )
{
    boost::property_tree::ptree propertiesTree;
    boost::property_tree::read_json( istr, propertiesTree );

The nice thing about the property tree is that (to quote from the documentation) “each node is also an STL-compatible Sequence for its child nodes”. An STL-compatible sequence means that I can use its iterators as input to the standard algorithms. The JSON input has been transformed into a boost::property_tree::ptree object. Now I want to transform it into a FileMetaDataVector object. Because boost::property_tree::ptree is STL compatible I can treat it just like I would any other container:

    FileMetaDataVector result;

    std::transform( 
        std::begin( propertiesTree ), 
        std::end( propertiesTree ), 
        std::back_inserter( result ), 
        valueToFileMetaData );

    return result;
}

I need to write valueToFileMetaData – the function to convert a child of boost::property_tree::ptree into a FileMetaData object. The implementation isn’t of interest for this post, the prototype for the function looks like this:

FileMetaData
valueToFileMetaData( 
    boost::property_tree::ptree::value_type const& );

I have said it before and I will say it again – one of the best things the STL does for us is to give us a framework. Write an algorithm or a container that fits into that framework and you instantly have access to everything else in that framework.

Incidentally, the code I have that reads in the file metadata from the file system also uses standard algorithms. I am using the boost filesystem library, and in particular, directory iterator. I didn’t use that code as an example because directory_iterator returns files and directories and I need to do a little work to split them out, but the principle is the same – because Boost treats a directory as an STL-compatible container I can use STL algorithms to operate on it.

Quote of the week – teaching

People have said you don’t understand something until you’ve taught it in a class. The truth is you don’t understand something until you’ve taught it to a computer, until you’ve been able to program it.

George Forsythe

Most of the quotes I pick are on this blog because I agree with them. George Forsythe was the founder and head of Stanford University’s Computer Science Department so he certainly has standing to comment on teaching methods, but I think that this quote underestimates the usefulness of teaching a class to other people. Teaching a computer is useful, it forces me into many corners that I wouldn’t have to explore and exposes my misconceptions; however teaching a class exposes me to other people’s misconceptions, many of which end up not being misconceptions at all. At the very least, they point me towards a different way of thinking about the problem.

Teach it in a class and teach it to a computer.

Algorithms in action – sorting (follow up)

In the first part of this series we looked at a number of ways of sorting the displayed elements without having to sort every item in a (possibly) large list. The final “pivot sort” kept the position of the currently selected element fixed on the display while the rest of the list sorted around it. The code was relatively simple – all of the heavy lifting was done by the standard library with one call to std::partition and two calls to std::partial_sort. There is only one problem – the function doesn’t work for a number of cases.

We can work out what these cases are in a number of ways. One is simply to run a large number of tests – always a good idea when dealing with code like this, however, without having thought through the problem and the special cases we can’t guarantee that we are testing all of the relevant cases. So, let’s look at the original problem statement and see what special cases we can find.

We never had an “official” statement of the problem, but this will do:

Keep the position of the currently selected element fixed on the display while the rest of the list sorts around it.

Are there any circumstances under which we cannot achieve this goal, or are there any circumstances where we can achieve this goal but it will lead to odd results? Here’s one problem case, we keep the selected element in its same place on the display but we end up with several blank spaces at the top of the display. We have achieved our goal, but we are not making the most of the display area (mouse over the image to see the before and after pictures):

We can also imagine a similar problem occurring at the end of the list. These “blanks” can occur for any length list if the selected element ends up close enough to the beginning or end of the list, however if the length of the list is smaller than the number of elements we display this problem is going to occur all of the time. There is also an implicit assumption I’ve been making all along that the selected element is actually on the display when we start to sort. If the element isn’t on display we can apply the same rules (the selected element remains in its same position relative to the display and everything else sorts around it) but it isn’t obvious to me that we’ll get useful results from doing that.

These are exactly the sort of cases we’d expect to need to test – values that are large or small or near some boundary. Several of these cases indicate areas where we need to make a decision – do we allow the list to display fewer elements than it can to keep the selected element in the correct place? What is the right behaviour if the selected element is off the display before the sort starts?

For the purposes of this post I am going to state the the algorithm must always display as many elements as possible (i.e. there will be no blanks on the display unless the total number of elements is less than the number we can display, and even then there must be as few blanks as possible). I will also state that if the selected element is off the top of the display it will end up in the top position on the display, and if the selected element is off the bottom of the display it will end up in the bottom position on the display.

There is another approach we can take that is helpful for this problem. Work our way through the code and see what places in the code might give us unexpected results.

void
sortAroundPivotElementBasic( 
    UInt nDisplayedElements, 
    UInt topElementIndex, 
    UInt pivotElementIndex, 
    Comparator const& comparator, 
    UInt& newTopElementIndex, 
    UInt& newPivotElementIndex,
    FMDVector& files )
{
    FileMetaData const pivotElement( 
        files[ pivotElementIndex ] );

    Iter pivotElementIterator( 
        std::partition( 
            std::begin( files ), 
            std::end( files ), 
            std::bind( comparator, _1, pivotElement ) ) );

The first couple of statements are good (I am assuming that nobody will call the code with pivotElementIndex outside the range of the file vector – if we want to guard against that we can easily add preconditions). Things start to get hairy with the next statement though:

UInt const pivotOffset( 
    pivotElementIndex - topElementIndex );

We are subtracting two unsigned integer values. If pivotElementIndex is smaller than topElementIndex, the result of the subtraction is a very large positive value that is almost certainly unexpected and unwanted. If pivotElementIndex is smaller than topElementIndex it means that the pivot element was above the top of the display – one of the problem cases we have just identified.

Iter displayBegin( pivotElementIterator - pivotOffset );

We know we can’t trust pivotOffset – it might be some extremely large number, however even if pivotOffset has a reasonable value, we do not know for certain that pivotElementIterator - pivotOffset will produce a legal iterator (i.e. one in the range files.begin() to files.end()). Notice that this matches another of the problem cases we identified above – the top element on the display actually comes before the first element in the vector leaving blank lines on the display.

Iter displayEnd( displayBegin + nDisplayedElements );

displayBegin is already suspect, and even if displayBegin is within range we have no guarantee that displayBegin + nDisplayedElements will be in range. displayEnd might end up being past the end of the vector – the “blanks at the end of the list” problem.

newTopElementIndex = displayBegin - std::begin( files );
newPivotElementIndex = 
    pivotElementIterator - std::begin( files );

There is nothing wrong with these two lines if displayBegin and pivotElementIterator are in range. pivotElementIterator is fine, however as we have just seen, displayBegin may well be out of range.

std::partial_sort( 
    pivotElementIterator, 
    displayEnd, 
    std::end( files ), 
    comparator );

We know that pivotElementIterator is good, but displayEnd is suspect.

    std::partial_sort( 
        ReverseIter( pivotElementIterator ), 
        ReverseIter( displayBegin ), 
        ReverseIter( std::begin( files ) ), 
        std::bind( comparator, _2, _1 ) );
}

Again, pivotElementIterator is good, but displayBegin is suspect.

There are three iterators that we need to know to perform this sort:

pivotElementIterator
displayBegin
displayEnd

We have seen that pivotElementIterator is easy to get correct, the problem is with the other two iterators. This is what I came up with to get correct values for displayBegin and displayEnd.

void
sortAroundPivotElement( 
    UInt nDisplayedElements, 
    UInt topElementIndex, 
    UInt pivotElementIndex, 
    Comparator const& comparator, 
    UInt& newTopElementIndex, 
    UInt& newPivotElementIndex,
    FMDVector& files )
{
    FileMetaData const pivotElement( 
        files[ pivotElementIndex ] );

    Iter const newPivotElementIterator( 
        std::partition( 
            std::begin( files ), 
            std::end( files ), 
            std::bind( comparator, _1, pivotElement ) ) );

The first two lines stay the same – the only possible errors here can be avoided by the caller.

newPivotElementIndex = 
    newPivotElementIterator - std::begin( files );

The calculation of newPivotElementIndex hasn’t changed, I have just moved it to earlier in the function because I am going to make use of it.

pivotElementIndex = boost::algorithm::clamp( 
    pivotElementIndex, 
    topElementIndex, 
    topElementIndex + nDisplayedElements - 1 );

We fix the problem of the pivot element being off the display by clamping the pivot element’s position to the limits of the display. We do this after we get the value of the pivot element. (Notice that since boost::algorithm::clamp is effectively a combination of std::min and std::max we have to use - 1 for our “end of range” calculation).

UInt pivotOffset( pivotElementIndex - topElementIndex );

Since we have clamped the value of pivotElementIndex we know that the subtraction will not overflow – pivotOffset has a legal value.

pivotOffset = std::min( pivotOffset, newPivotElementIndex );

We clamp pivotOffset to make sure that we won’t run off the beginning of the vector.

Iter displayBegin( 
    newPivotElementIterator - pivotOffset );
newTopElementIndex = displayBegin - std::begin( files );

We know for sure that displayBegin is a valid iterator since we know that pivotOffset is in a legal range.

nDisplayedElements = std::min( 
    nDisplayedElements, 
    files.size() );

Solves any problems caused by having a list shorter than the size of our display.

Iter const displayEnd( 
    files.begin() + 
    std::min( 
        newTopElementIndex + nDisplayedElements, 
        files.size() ) );

displayEnd is now in the correct place, and a legal iterator. Now we can use displayEnd to recalculate displayBegin (yes, we have to calculate displayBegin twice).

displayBegin = std::min( 
    displayEnd - nDisplayedElements, 
    displayBegin );
newTopElementIndex = displayBegin - std::begin( files );

Finally all of our iterators are correct and we can sort the elements.

    std::partial_sort( 
        newPivotElementIterator, 
        displayEnd, 
        std::end( files ), 
        comparator );

    std::partial_sort( 
        ReverseIter( newPivotElementIterator ), 
        ReverseIter( displayBegin ), 
        ReverseIter( std::begin( files ) ), 
        std::bind( comparator, _2, _1 ) );
}

Here’s the function in a single block for clarity:

void
sortAroundPivotElement( 
    UInt nDisplayedElements, 
    UInt topElementIndex, 
    UInt pivotElementIndex, 
    Comparator const& comparator, 
    UInt& newTopElementIndex, 
    UInt& newPivotElementIndex,
    FMDVector& files )
{
    FileMetaData const pivotElement( 
        files[ pivotElementIndex ] );

    Iter const newPivotElementIterator( 
        std::partition( 
            std::begin( files ), 
            std::end( files ), 
            std::bind( comparator, _1, pivotElement ) ) );

    newPivotElementIndex = 
        newPivotElementIterator - std::begin( files );

    pivotElementIndex = boost::algorithm::clamp( 
        pivotElementIndex, 
        topElementIndex, 
        topElementIndex + nDisplayedElements - 1 );

    UInt pivotOffset( pivotElementIndex - topElementIndex );
    pivotOffset = std::min( pivotOffset, newPivotElementIndex );

    Iter displayBegin( 
        newPivotElementIterator - pivotOffset );
    newTopElementIndex = displayBegin - std::begin( files );

    nDisplayedElements = std::min( 
        nDisplayedElements, 
        files.size() );

    Iter const displayEnd( 
        files.begin() + 
        std::min( 
            newTopElementIndex + nDisplayedElements, 
            files.size() ) );

    displayBegin = std::min( 
        displayEnd - nDisplayedElements, 
        displayBegin );
    newTopElementIndex = displayBegin - std::begin( files );

    std::partial_sort( 
        newPivotElementIterator, 
        displayEnd, 
        std::end( files ), 
        comparator );

    std::partial_sort( 
        ReverseIter( newPivotElementIterator ), 
        ReverseIter( displayBegin ), 
        ReverseIter( std::begin( files ) ), 
        std::bind( comparator, _2, _1 ) );
}

I quite like my original function – sortAroundPivotElementBasic. I don’t like the new function nearly as much. Of course, sortAroundPivotElement has the advantage that it works (not something to dismiss lightly) but it lacks Tony Hoare’s quality of being “so simple that there are obviously no deficiencies”. I believe that the code is correct. I have done extensive testing on the function. I worked my way through all of the places in the code where things might be incorrect and made sure that I was doing appropriate checks. Even with all of that I am not convinced of my ability to explain the code correctly (I have done my best with the explanation above but I don’t think that my explanation is perfect).

Of course one possibility is that I am not as good at reading, writing and explaining code as I should be, but even though that’s true, it’s still worth looking at ways to improve sortAroundPivotElement.

The function now obviously splits into three sections (that was always the case, it just wasn’t as clear with the basic version):

Calculate the new position of the pivot element
Calculate displayBegin and displayEnd.
Sort the elements on screen

The calculation of displayBegin and displayEnd is an obvious step to split out into its own function. That would at least keep the main function closer to the core parts of the algorithm, and keep all of the special case handling in a separate function.

I can think of various ways to handle the special cases separately (if the number of elements in the vector is smaller than the number we can display we might just as well sort the entire vector; if the new position of the pivot element is close enough to the beginning or end of the vector to cause a problem we could just use std::partial sort on the beginning or end) but they add the overhead of splitting out the special cases.

Maybe I’ll eventually come up with a better way of calculating displayBegin and displayEnd, in the meantime, we have a function that works.

Unsigned integer variables

I made a conscious decision when I wrote this code to avoid integer conversions (implicit or explicit) as much as possible. That lead to my basic integer type being an unsigned type:

typedef std::vector< FileMetaData > FMDVector;
typedef FMDVector::size_type        UInt;

I was able to achieve my goal – there are no implicit or explicit integer conversions. I have a working function where every integer in use is the same unsigned type. Surely this is a good thing?

The basic problem with unsigned integer types is that the model they provide breaks down when you try to set them to a negative value, whether directly or as the result of an operation (such as the subtraction in sortAroundPivotElementBasic) that could overflow.

This has been discussed more than once on comp.lang.c++.moderated:

Discussion 1
Discussion 2

I am opposed to using unsigned integers for arithmetic (rather than bit array) purposes. Every time I have tried to use unsigned integers for quantities that must always be positive (for example, the width and height of a rectangle in a 2D graphics system), I inevitably end up needing to subtract them, and all of a sudden I have overflow problems.

I managed to make an “all unsigned all the time” policy work with sortAroundPivotElement. I am slightly surprised about that because during development I had several problems with overflow. It took some time to come up with a solution where overflow was not a problem.

Even though I managed to get a working solution, I would still change the unsigned ints for signed ints if I was putting this into production code. The function as it stands is fragile. I managed to come up with a solution without overflow, however if someone (and this includes me) has to change the function at a later date (e.g. because we want the special cases to be handled differently) they will have to do a bunch more work to maintain that property (and might fail under circumstances that will, of course, elude all testing and only manifest themselves when a potential customer is evaluating the software).

C++11 makes this easy for us by giving us an easy way to make a signed type out of the corresponding unsigned type:

typedef std::make_signed< UInt >::type Int;

Of course then we will have to convert between integer types since std::vector::size returns an unsigned type. We can moan about this, but it isn’t going to change.

Quote of the week – design

The open secrets of good design practice include the importance of knowing what to keep whole, what to combine, what to separate, and what to throw away.

Kevlin Henny

Algorithms in action – sorting

I like to have a personal project that I use to explore different programming ideas – my current personal project is an editor. A couple of features I am adding require the use of a list – for example, the editor is working on a certain set of files and I want to be able to see the files, perhaps sorted by file name:

sort filename

Perhaps sorted by directory:

sort directory

This is completely typical list control behaviour – click on a column header to sort on that column and scroll through the list to see every element in its correct place. Of course I want this to be highly responsive, and I also want to be able to have many millions of elements in the list – I probably won’t ever have a project with a million files in it but I might well have a million symbols.

I am using a virtual list control – the list control itself does not have most of the data for the list. It knows how many elements are in the list and which block of elements it is displaying, then it calls back into my code to get the text for the elements it is displaying. The list control does not have to deal with millions of elements, it just has to deal with the elements it is currently displaying.

For my definition of “responsive” I am going to use Jakob Nielsen’s values from his article Response Times: The 3 Important Limits:

< 0.1 second for that instantaneous feeling
< 1 second to avoid interrupting the user’s flow of thought
< 10 seconds to avoid losing the user’s attention altogether.

Useful definitions

For this example we have a very simple struct containing file metadata, and a comparator class that gives us an ordering of FileMetaData objects. I have left out the details of the comparator class, but we can tell it to compare on filename, directory, size or last write time.

struct FileMetaData
{
    std::string fileName_;
    std::string directory_;
    
    std::size_t size_;
    std::time_t lastWriteTime_;
};

class Comparator
{
public:
    bool operator()( 
        FileMetaData const& a, 
        FileMetaData const& b )
}

There are also a few typedefs that will come in useful:

typedef std::vector< FileMetaData > FMDVector;
typedef FMDVector::iterator         Iter;
typedef FMDVector::reverse_iterator ReverseIter;
typedef FMDVector::size_type        UInt;

The problem

When we click on a column header the list should be resorted on the data in that column. We should be able to scroll through the list and see every list element in the correct place.

Solution #0

Since we need to sort the elements in the list why not just use std::sort? The code is simple and obvious:

void
sortAllElements( 
    Comparator const& comparator, 
    FMDVector& files )
{
    std::sort( 
        std::begin( files ), 
        std::end( files ), 
        comparator );
}

(I know that the sort barely deserves to be inside its own function but it’ll make it consistent with our more complicated sorting later).

Whenever we click on a column header we update the comparator and call sortAllElements. The code is simple and it works.

The catch comes when we look at performance. Here are the times to sort different numbers of elements:

# elements	sort all
1,000	0.000s
10,000	0.000s
100,000	0.187s
1,000,000	2.480s
10,000,000	33.134s

The timings look great up to 10,000, even sorting 100,000 elements is acceptable. Once we get into the millions though the numbers get much worse. We really don’t want to be waiting around for 2 seconds, let alone 30+ seconds.

The C++ standard tells us that std::sort has a complexity of O( n * log n ). As we have just seen, we get good results with small numbers of elements, but much worse results as n gets larger.

So the straightforward sort works well for smaller numbers of elements, but we are going to need something smarter for millions of elements. There are several possibilities:

Offload some of the work onto separate threads – even if we didn’t speed up the overall sort time we would at least avoid locking up the main UI thread.
Use previous knowledge of the ordering to avoid having to do a complete re-sort.
Keep multiple copies of the list, each sorted by a different column so that the right one can be instantly swapped in.

For the purposes of this blog post I am going to avoid #1 because threading opens up its own can of worms. I will declare that #2 is invalid because we don’t have any previous knowledge (or at least we have to have a solution that works responsively even if we don’t have any previous knowledge of the ordering). I am not going to consider #3 because I don’t want to make that particular speed / space tradeoff.

Instead I am going to take my cue from the virtual list control. It only needs to know about the elements it is displaying – those are the only elements that need to be in sorted order.

The problem has now moved from “sort the entire list” to “sort that portion of the list that is displayed”. If we can always guarantee that the portion of the list that is displayed is sorted correctly, the user will never know the difference. This has the potential to be much quicker – the number of displayed elements is relatively small. I can get around 70 lines on my monitor, let’s round that up to 100 – we need to get the 100 displayed elements correctly sorted and in their right place.

Solution #1

We’ll start with a straightforward case, the 100 elements we want to display are right at the beginning of the list (quite possibly the case when the list is first displayed). The standard library supplies us with many more sorting related algorithms than std::sort, and for this case it gives us std::partial_sort. As the name suggests, it partially sorts the list – we tell it how much of the beginning of the list we want it to sort:

void
sortInitialElements( 
    UInt k, 
    Comparator const& comparator, 
    FMDVector& files )
{
    std::partial_sort( 
        std::begin( files ), 
        std::begin( files ) + k, 
        std::end( files ), 
        comparator );
}

(There is no error handling in the function itself – I am assuming that the caller has ensured the values being passed in are within range.)

We have an additional parameter – k, the number of elements we want sorted correctly. We use k to create an iterator that we pass to std::partial_sort.

According to the standard, std::partial_sort has a complexity of O( n * log k ). Since k is likely to be much smaller than n (we’re assuming that k is 100) we should get better performance:

# elements	sort all	sort initial
1,000	0.000s	0.000s
10,000	0.000s	0.000s
100,000	0.187s	0.000s
1,000,000	2.480s	0.047s
10,000,000	33.134s	0.406s

This is looking much better. We are still in the instantaneous category at one million elements and even though 10 million is pushing us to around half a second it is still a vast improvement over sorting the entire list.

Moving on, it’s all very well being able to sort the beginning of the list correctly but we also want to be able to scroll through the list. We want to be able to display any arbitrary range of the list correctly.

Solution #2

Once again, the standard provides us with a useful algorithm. The function std::nth_element lets us put the nth element of a list into its correct place, and also partitions the list so that all of the elements before the nth element come before that element in sort order, and all of the elements after come after the nth element in sort order.

So, we can use std::nth_element to get the element at the beginning of the range into the correct place, then all we have to do is sort the k elements afterwards to get our range sorted. The code looks like this:

void
sortElementsInRange( 
    UInt firstElementIndex, 
    UInt k, 
    Comparator const& comparator, 
    FMDVector& files )
{
    std::nth_element( 
        std::begin( files ), 
        std::begin( files ) + firstElementIndex, 
        std::end( files ), 
        comparator );
    
    std::partial_sort( 
        std::begin( files ) + firstElementIndex, 
        std::begin( files ) + firstElementIndex + k, 
        std::end( files ), 
        comparator );
}

and the performance data:

# elements	sort all	sort initial	sort range
1,000	0.000s	0.000s	0.000s
10,000	0.000s	0.000s	0.016s
100,000	0.187s	0.000s	0.016s
1,000,000	2.480s	0.047s	0.203s
10,000,000	33.134s	0.406s	2.340s

A million elements pushes us outside the “instantaneous” limit, but still acceptable. Unfortunately, 10 million elements isn’t great.

There is one more feature of sorting lists that I love. I use it in my email all the time. By default, my email is sorted by date – I want to see the most recent emails first. Sometimes though I’ll be looking at an email and want to see other emails from that sender. When that happens, I can click on the “sender” column header and have the email I currently have selected stay in the same place while the rest of the list sorts itself around the selected email. The selected element acts like a pivot and the rest of the lists moves around the pivot.

Mouse over this image to see an example of this sort on our list of files:

Solution #3

WARNING – the code I am about to present does not work under all circumstances. I want to focus on the sorting algorithms so for the time being I am not going to clutter the code up with all of the checks needed to make it general. DO NOT USE THIS CODE. I will write a follow up blog post presenting a correct version of this function with all of the checks in place.

Sorting around a pivot element is a more complicated function than the others so I’ll walk through it step by step.

void
sortAroundPivotElementBasic( 
    UInt nDisplayedElements, 
    UInt topElementIndex, 
    UInt pivotElementIndex, 
    Comparator const& comparator, 
    UInt& newTopElementIndex, 
    UInt& newPivotElementIndex,
    FMDVector& files )

nDisplayedElements The number of elements displayed on screen.
topElementIndex The index of the top displayed element of the list.
pivotElementIndex The index of the pivot element.
comparator Same as the other sort functions – the object that lets us order the list.
newTopElementIndex An output parameter – the index of the top element after sorting.
newPivotElementIndex The index of the pivot element after sorting.
files The list of files we are sorting.

Our first job is to find out the new position of the pivot element. Previously we had used std::nth_element to partition the range, this time, since we actually know the value of the element, we’ll use std::partition:

FileMetaData const pivotElement( 
    files[ pivotElementIndex ] );

Iter pivotElementIterator( 
    std::partition( 
        std::begin( files ), 
        std::end( files ), 
        std::bind( comparator, _1, pivotElement ) ) );

Notice that although we are using the same comparator we have used for all of our sort operations we are binding the pivot element to the comparator’s second argument. std::partition expects a unary predicate – it moves all elements for which the predicate returns true to the front, and all elements for which the predicate returns false to the back. It then returns an iterator corresponding to the point between the two partitions. By binding our pivot element to the second parameter of comparator we get a unary predicate that returns true if an element is less than the pivot element and false otherwise.

We know where the pivot element ends up in the list, and we also know where it ends up on the display – we want it to stay in the same position on the display. This means we can work out the range of elements that are displayed.

UInt const pivotOffset( 
    pivotElementIndex - topElementIndex );

Iter displayBegin( pivotElementIterator - pivotOffset );
Iter displayEnd( displayBegin + nDisplayedElements );

Now we can fill in two of our return values.

newTopElementIndex = displayBegin - std::begin( files );
newPivotElementIndex = 
    pivotElementIterator - std::begin( files );

To recap, we have two iterators that tell us the range of elements that are displayed – displayBegin and displayEnd. We have a third iterator that tells us where the pivot element has ended up – pivotElementIterator. We also know that the elements have been partitioned – we have two partitions, each with all the right elements, but in the wrong order. The boundary between those two partitions is the location of the pivot element.

We’ve used std::partial_sort a couple of times already, we can use it again to sort the bottom section of the display:

std::partial_sort( 
    pivotElementIterator, 
    displayEnd, 
    std::end( files ), 
    comparator );

Of course we also want to sort the top section of the display. All we need is a version of std::partial_sort that will sort the end of a list, not the beginning. The standard library doesn’t supply us with that algorithm, but it does give us a way of reversing the list. If we reverse the list above the pivot element we can use std::partition. We do this using reverse iterators:

std::partial_sort( 
    ReverseIter( pivotElementIterator ), 
    ReverseIter( displayBegin ), 
    ReverseIter( std::begin( files ) ), 
    std::bind( comparator, _2, _1 ) );

Reverse iterators are easily constructed from the forward iterators that we already have, and they plug right into algorithms just like regular iterators do. The beginning of our reversed list is at the pivot element, the end is at std::begin( files ) and the element we are sorting to is the top of the display. Notice that we had to reverse our comparator as well (and notice that reversing the comparator does not involve applying operator ! to its output, we actually need to reverse the order of the arguments).

Here’s the function in one block:

void
sortAroundPivotElementBasic( 
    UInt nDisplayedElements, 
    UInt topElementIndex, 
    UInt pivotElementIndex, 
    Comparator const& comparator, 
    UInt& newTopElementIndex, 
    UInt& newPivotElementIndex,
    FMDVector& files )
{
    FileMetaData const pivotElement( 
        files[ pivotElementIndex ] );

    Iter pivotElementIterator( 
        std::partition( 
            std::begin( files ), 
            std::end( files ), 
            std::bind( comparator, _1, pivotElement ) ) );

    UInt const pivotOffset( 
        pivotElementIndex - topElementIndex );

    Iter displayBegin( pivotElementIterator - pivotOffset );
    Iter displayEnd( displayBegin + nDisplayedElements );
    
    newTopElementIndex = displayBegin - std::begin( files );
    newPivotElementIndex = 
        pivotElementIterator - std::begin( files );
    
    std::partial_sort( 
        pivotElementIterator, 
        displayEnd, 
        std::end( files ), 
        comparator );
    
    std::partial_sort( 
        ReverseIter( pivotElementIterator ), 
        ReverseIter( displayBegin ), 
        ReverseIter( std::begin( files ) ), 
        std::bind( comparator, _2, _1 ) );
}

And the performance?

# elements	sort all	sort initial	sort range	sort around pivot
1,000	0.000s	0.000s	0.000s	0.000s
10,000	0.000s	0.000s	0.016s	0.000s
100,000	0.187s	0.000s	0.016s	0.016s
1,000,000	2.480s	0.047s	0.203s	0.094s
10,000,000	33.134s	0.406s	2.340s	1.061s

Pretty darn good, in fact better than sorting a given range. Again, once we get up to ten million elements we lose responsiveness, I think the lesson here is that once we get over one million elements we need to find some other techniques.

I want to repeat my warning from above. The code I have just presented for the pivot sort fails in some cases (and fails in an “illegal memory access” way, not just a “gives the wrong answer” way). I will write a follow up post that fixes these problems.

Wrap up

We have seen that with a little work, and some standard library algorithms, we can do distinctly better than sorting the entire list, we can increase the number of elements by 2 orders of magnitude and still get reasonable performance. Extremely large numbers of elements still give us problems though.

Is this the best we can do? It’s the best I’ve been able to come up with – if you have something better please post it in the comments, or send me a link to where you describe it.

One last thought. Earlier on I dismissed option #2 – “Use previous knowledge of the ordering to avoid having to do a complete re-sort.” by saying that I wanted algorithms that would work without prior knowledge. We now have those algorithms, and once we have applied any of them we will have some knowledge of the ordering. For example, once we have sorted a given range we can divide the data set into 3 – the first block of data has all the right elements, just in the wrong order, the second block has all of the right elements in the right order and the third block has all of the right elements in the wrong order. We can use this information as we’re scrolling through the list to make updates quicker.

Quotes of the week – efficiency

There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgments about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail. After working with such tools for seven years, I’ve become convinced that all compilers written from now on should be designed to provide all programmers with feedback indicating what parts of their programs are costing the most; indeed, this feedback should be supplied automatically unless it has been specifically turned off.

Donald Knuth

More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason – including blind stupidity.

William A. Wulf

Interestingly, despite the fact that neither of these quotes says anything specifically about goto, and that the sentiments can be applied to programming as a whole, they both come from papers talking about the use of gotos. Knuth’s quote is from Structured Programming with go to Statements, (Computing Surveys, Vol 6, No 4, December 1974, p268). Wulf’s quote is from A Case Against the GOTO (Proceedings of the ACM National Conference, ACM, Boston, August 1972, p796)

Knuth’s quote is usually abbreviated to “premature optimization is the root of all evil”, sometimes “97% of the time” is mentioned. I wanted to put the quote in context – the surrounding text adds a layer of nuance that isn’t present in the sound bite.

There is some interesting background information on the origin of Knuth’s quote at Shreevatsa R’s blog here.

Quote of the week – correctness

Correctness is clearly the prime quality. If a system does not do what it is supposed to do, then everything else about it matters little.

Bertrand Meyer

No raw loops 4 – std::transform

Transform each element of a container by passing it through a function and putting the result into another container

So far we’ve only looked at one algorithm – std::for_each. That’s allowed me to lay the groundwork for using lambdas and std::bind, but there are many more algorithms avilable in the standard library. In fact, in this thread , John Potter refers to std::for_each as “the goto of algorithms” (and that is far from the worst thing he says about std::for_each).

You can use std::for_each to simulate pretty much every algorithm, but as usual, just because you can do something doesn’t mean that you should. There are more specific algorithms we can use to express our intent directly, and being able to express our intent directly when we’re writing code is a Good Thing.

In this installment we move on from just calling a function with each element, now we’re going to call a function, get a result back from that function, then store that result.

Our declarations have a new function – f_int_T – a function that takes an object of type T and returns an integer.

class T
{
   ...
};

std::vector< T > v_T;
std::vector< int > v_int;

int f_int_T( T const& t );

Here’s the old way of solving our problem:

Solution #0

std::vector< T >::iterator i;
std::vector< int >::iterator j;

for( 
    i = std::begin( v_T ), 
    j = std::begin( v_int );
    i != std::end( v_T ); 
    ++i, ++j )
{
    *j = f_int_T( *i );
}

I am assuming that our output container v_int has sufficient size for each element being assigned to it. It is easy to change this code to call push_back on the output container if necessary.

We now have two iterators – one for the source range and one for the output. We call the function f_int_T for each element of the source and we write the result of the function to the output. We don’t need to explicitly use the end of the output range – std::end( v_int ) doesn’t appear anywhere in the example.

We can implement the same algorithm using range-based for:

Solution #1

std::vector< int >::iterator j( std::begin( v_int ) );
for( T const& t : v_T )
{
    *j++ = f_int_T( t );
}

Using range-based for works, but we have to increment the output iterator j inside the body of the loop. This example would be simpler if push_back was appropriate.

So let’s introduce std::transform and see what we get:

Solution #2

std::transform( 
    std::begin( v_T ), 
    std::end( v_T ), 
    std::begin( v_int ), 
    []( T const& t ) 
    {
        return f_int_T( t );
    } );

The lambda function is overkill, but I wanted to reinforce the fact that lambdas are functions, which means they can have a return value. In this case because the lambda consists of a single return statement the compiler will automatically deduce the type of the return value (there is a way of explicitly specifying the type of the return value but I am not going into that here).

Here’s something interesting. The blocks of code in solutions #0 and #1 look like this:

{
    *j = f_int_T( *i );
}

{
    *j++ = f_int_T( t );
}

There’s an assignment in both of them and they both dereference the output iterator. The range-based for version has to increment the output iterator itself. Contrast that with the block of code in the solution #2, the lambda function:

{
    return f_int_T( t );
}

The code inside the lambda is doing nothing other than calling f_int_T and returning the result. All of the iteration, the dereferencing, the assignment is being handled by the std::transform algorithm.

As I said, the lambda is overkill, we can just use the function directly as an argument:

Solution #3

std::transform( 
    std::begin( v_T ), 
    std::end( v_T ), 
    std::begin( v_int ), 
    f_int_T );

This solution makes the intent of the code clear. We have the algorithm completely separated from the action to take each time through the loop. When I look at this code and see std::transform I know how the iteration will work. I know that the number of elements placed into the output will be the same as the number of elements in the source range. I know that no elements in the source range will be skipped. Short of exceptions I know that there will be no early exit from the loop. (Many of these arguments can be applied to std::transform with a lambda function, but I think the intent is even more pronounced here).

The function that does the transformation is entirely separate and can be tested in isolation (this is the advantage of the named function over a lambda). We have separated things that used to be commingled.

We can simplify things even more using adobe::transform:

Solution #4

adobe::transform( 
    v_T, 
    std::begin( v_int ), 
    f_int_T );

Solution #4 strips things down to their basics – we have our source, the place where the output will go and what we’re going to do to each element. There is almost no boilerplate.

Incidentally, the output range can be the same as the source range. If the function changes the value but not the type of its argument (or the return type of the function can be converted to the argument type), we can run std::transform to change every element of a container in place.

I have chosen a simple example to illustrate the use of std::transform. The function we are calling is a non-member function and has a single parameter of the correct type. Refer to part 2 and part 3 of this series for more information about using std::bind to call more complex functions.

Variation #1

My examples all assumed that we had enough space in the output container, but what if we don’t? Perhaps we are trying to build the output container from scratch or append elements to an existing container. We don’t want to have to resize the container first and incur the cost of constructing default constructed objects that are going to get overwritten. We won’t be able to resize it at all if the type we are constructing doesn’t have a default constructor, or if we don’t know how many elements we are going to put into it (e.g. we are using input iterators). What if we want to use push_back on every new element?

Solutions #0 and #1 are easy. Because they handle the dereferencing and assignment within their loop body it is easy to change them to use push_back. E.g.:

Solution #0.1

std::vector< T >::iterator i;

for( 
    i = std::begin( v_T );
    i != std::end( v_T ); 
    ++i )
{
    v_int.push_back( f_int_T( *i ) );
}

In fact this is simpler than our original solution #0 since we don’t need an iterator for the output at all.

If we want the effect of push_back when we use std::transform (or any other algorithm that writes to iterators) we use an insert iterator. An insert iterator looks like an iterator (i.e. it has the operations we would expect from an iterator – although many of them have no effect for an insert iterator) but when we assign to it, it will call a function on the container – in our case we want it to call push_back.

It’s easier to show than talk about (see Josuttis “The C++ Standard Library” 2nd edition for a proper description):

adobe::transform(
    v_T,
    std::back_inserter( v_int ),
    f_int_T );

The iterator created by std::back_inserter will call push_back every time it is assigned to. The standard library also contains std::front_inserter which will call push_front, and std::inserter which calls insert.

One drawback of push_back is that it might cause the output container’s memory to be reallocated and all of the container’s contents copied over – in fact depending on the reallocation strategy it might have to reallocate several times as elements are appended. Assuming that we know the ultimate size of the output container we can call reserve which will set aside the appropriate amount of memory but without default constructing any additional elements. This lets us use push_back and know that there will only be a single memory reallocation.

Variation #2

What if we don’t necessarily want to process every element? What if we want something like this:

std::vector< T >::iterator i;
std::vector< int >::iterator j;

for( 
    i = std::begin( v_T ), 
    j = std::begin( v_int );
    i != std::end( v_T ); 
    ++i, ++j )
{
    if( somePredicate( *i ) )
    {
        *j = f_int_T( *i );
    }
}

Some of the algorithms have “if” versions – we pass in a predicate and their action is only taken if that predicate returns true. Among others, std::copy, std::remove and std::count all have “if” variants that take predicates. There isn’t a transform_if provided in the standard, but it isn’t difficult to write one:

template<
    class InputIterator,
    class OutputIterator,
    class UnaryOperation,
    class UnaryPredicate >
OutputIterator transform_if(
    InputIterator first,
    InputIterator last,
    OutputIterator result,
    UnaryOperation op,
    UnaryPredicate test)
{
    while( first != last )
    {
        if( test( *first ) )
        {
            *result++ = op( *first );
        }
        ++first;
    }
    return result;
}

[ Edit: Guvante points out in the comments here that my implementation of transform_if does not match the for_each version. ]

To my mind, this is the real beauty of the STL – as well as providing a large set of containers and algorithms, it is extensible. The framework it provides makes it possible to add new algorithms or containers which interact seamlessly with existing parts of the STL. transform_if is a new algorithm, but because it follows existing practice it doesn’t take a lot of additional work to understand it.

Quote of the week – wisdom

The road to wisdom?
Well its plain and simple to express:
Err and err and err again,
but less and less and less.

Piet Hein

Piet Hein predicted the iterative model of development found in agile programming techniques, as well as inventing the board game hex.

No raw loops 3 – member functions

Loop over all the elements in a container and call a member function (with arguments) on each element

I want to introduce one more std::bind trick. In part 1 and part 2 the functions called were non-member functions and we were passing each element of the container into the function as an argument. In this installment we are going to call a member function on each element in the container. We’ll extend our set of declarations to include a member function on class T:

class T
{
public:
   void mf_int( int ) const;
   ...
};

std::vector< T > v_T;

We have a vector full of objects of type T, we want to call the member function mf_int on each object in that vector, and we have to pass an integer argument to mf_int (I am skipping the example where we don’t have to pass any additional arguments to the function).

Let’s start off with the “old style” solution:

Solution #0

int j( 7 );
for( std::vector< T >::iterator i( std::begin( v_T ) ); 
    i != std::end( v_T ); 
    ++i )
{
    i->mf_int( j );
}

The range-based for version of the solution is as expected:

Solution #1

for( T const& t : v_T )
{
    t.mf_int( j );
}

The lambda function version also holds no surprises:

Solution #2

std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    [ j ]( T const& t )
{
    t.mf_int( j );
} );

The std::bind version looks a little different though:

Solution #3

using namespace std::placeholders;
std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    std::bind( &T::mf_int, _1, j ) );

The std::bind statement includes something new. &T::mf_int refers to a member function of class T – the function T::mf_int. When std::bind sees a member function (as opposed to a regular function) as its first argument, it expects the second argument to be the object on which to call the member function. In this case since the second argument is _1, it will call the member function T::mf_int on every element of the container. There is an additional argument to std::bind (j) which will be passed as a parameter to T::mf_int.

Having said above that I was skipping the example where we don’t have to pass any additional arguments to the function I want to mention that example briefly in connection with std::mem_fn, another function adapter (std::bind is a function adapter).

std::mem_fn provides a shorter way to specify that you want to call a member function with no arguments. Let’s add a new declaration to class T:

class T
{
public:
   void mf_int( int ) const;
   void mf_void() const;
   ...
};

When we use std::mem_fn we don’t need to specify the placeholder _1:

std::for_each( 
    std::begin( v_T ), 
    std::end( v_T ), 
    std::mem_fn( &T::mf_void ) );

std::mem_fn only works for member functions taking no arguments.

We can also use adobe::for_each:

Solution #4

adobe::for_each( v_T, std::bind( &T::mf_int, _1, j ) );

Wrap up

When we’re using std::bind we always specify the function we are going to call as the first argument. If the function we are going to call is a member function we specify the object on which we are going to call the member function second. Any additional parameters follow.

Random programming

Random programming

Bob Archer's thoughts about programming

Algorithms in action – std::transform

Quote of the week – teaching

Algorithms in action – sorting (follow up)

Unsigned integer variables

Quote of the week – design

Algorithms in action – sorting

Useful definitions

Wrap up

Quotes of the week – efficiency

Quote of the week – correctness

No raw loops 4 – std::transform

Quote of the week – wisdom

No raw loops 3 – member functions