In the first part of this series we looked at algorithms which guaranteed their order of operations vs. algorithms that don’t. On further reflection, there are two things missing from that article.

First of all, I should have included std::iota in the list of “order guaranteed” algorithms. I overlooked this partly because std::iota doesn’t take an input range, but also because the language used in the standard isn’t entirely clear. I am reasonably confident though that operator++ must be applied starting with the first value output, and continuing in sequence. (As always, if someone has some evidence I am wrong about this please tell me).

This raises the number of algorithms that guarantee ordering to 14.

Secondly, we didn’t look at “why”. Why should some algorithms guarantee their operation order while others don’t? Let’s look at the 14 algorithms in groups, starting with std::copy and std::move:

OutputIterator | copy | (InputIterator, InputIterator, OutputIterator) |

BidirectionalIterator2 | copy_backward | (BidirectionalIterator1, BidirectionalIterator1, BidirectionalIterator2) |

OutputIterator | move | (InputIterator, InputIterator, OutputIterator) |

BidirectionalIterator2 | move_backward | (BidirectionalIterator1, BidirectionalIterator1, BidirectionalIterator2) |

Copy and move have specific forward and backward versions. This is to handle overlapping input and output ranges. In order for copying or moving between overlapping ranges to work as expected, the order must be specified. The order will be different depending on whether the destination range overlaps the beginning or the end of the source range – hence the forward and backward versions.

(There was an issue raised in 1995 about the direction of std::copy. Read it here.)

Moving on, the next algorithm is std::for_each:

Function | for_each | (InputIterator, InputIterator, Function) |

As Kreft and Langer point out here, std::for_each was unusual among the algorithms because it allows its function object to have side effects. I say ** was** because the C++98 standard includes the phrase –

*op and binary_op shall not have any side effects*– in the description of std::transform. The C++11 standard weakens that requirement.

Regardless of the changes to the wording of std::transform, once your function object can have side effects, it is possible for those side effects to vary depending on the order in which the range is processed. To get deterministic results you need a deterministic processing order, hence (at least historically) the order requirement for std::for_each.

That leaves us with 9 algorithms to explain:

T | accumulate | (InputIterator, InputIterator, T) |

T | accumulate | (InputIterator, InputIterator, T, BinaryOperation) |

OutputIterator | adjacent_difference | (InputIterator, InputIterator, OutputIterator) |

OutputIterator | adjacent_difference | (InputIterator, InputIterator, OutputIterator, BinaryOperation) |

T | inner_product | (InputIterator1, InputIterator1, InputIterator2, T) |

T | inner_product | (InputIterator1, InputIterator1, InputIterator2, T, BinaryOperation1, BinaryOperation2) |

void | iota | (ForwardIterator, ForwardIterator, T) |

OutputIterator | partial_sum | (InputIterator, InputIterator, OutputIterator) |

OutputIterator | partial_sum | (InputIterator, InputIterator, OutputIterator, BinaryOperation) |

What do these algorithms have in common?

- They are all in the header numeric. In fact, they are the entire contents of the header numeric.
- They all involve combining elements of the input range in some way (or the output range for std::iota).

So we have “in order”, “numeric”, and “combining”. “Numeric” leads us to think of floating point values, and the combination of “in order” and “floating point” leads us to the fact that floating point arithmetic is neither associative nor commutative. [Edit: See John Payson’s comment where he points out that IEEE-754 floating point *is* commutative for addition and multiplication.] Running std::accumulate from front to back does not necessarily give us the same answer as running it from back to front. Once again, if we want a deterministic answer we need a deterministic processing order, hence the “in order” requirement for these algorithms.

As usual, if the standard algorithms do not supply us with what we want, we can write our own.

The C++11 standard also makes it clear that the compiler is limited in its rearrangement of operators:

[ Note: Operators can be regrouped according to the usual mathematical rules only where the operators really are associative or commutative.

**C++11 standard** *1.9*

In case anyone thinks that floating point commutativity [Edit: I should have said associative here] only matters for very large, very small or very accurate calculations, it doesn’t. One of my home projects is a music typesetting program. It lays out music onto a typical sheet of paper (A4 or 8.5 x 11) and it doesn’t need micron accuracy, however I still got caught by floating point operations.

At one point I had two functions (A and B) that computed what was nominally the same value, but in two different ways. I sorted a vector using a comparison operator that called function A. I then attempted to do a binary search on the vector using a comparison operator that called function B. Fortunately the debug version of the STL I was using detected attempts to use a binary search on a vector that was not sorted – the vector was sorted according to A, it was not sorted according to B. Yet another reason why “don’t repeat yourself” is such a good rule.

In part 1 I mentioned the possibility of having some of the algorithms perform their calculations in parallel. The C++11 standard has this to say:

Unless otherwise specified, C++ standard library functions shall perform all operations solely within the current thread if those operations have effects that are visible (1.10) to users.

[ Note: This allows implementations to parallelize operations if there are no visible side effects. —end note ]

**C++11 standard** *17.6.5.9*

I.e. no parallelism unless it’s invisible to the user.

There are some interesting proposals for explicitly parallel algorithms. For example, see section 6.6 *Associativity/Commutativity of Binary Operators* (p66) of A Parallel Algorithms Library which points out that some parallel algorithms will require associative and commutative operators.

Parallelizing the Standard Algorithms Library also looks at the topic of parallelization. For example, see section 5.2 *std::accumulate versus thrust::reduce* for a discussion of the need for associative and commutative operators.

Several of my references in this post came from this website which contains many papers from the C++ standards committee. If you have far too much time to kill, the site contains fascinating information – current and historical.