September | 2016 | Random programming

Note: Part way through writing this post I discovered that the boost graph library had explored many of these concepts already. I decided to finish the post anyway because I think my design process might be interesting.

One of my personal projects is a radiosity renderer. The world doesn’t need another radiosity renderer, but I wanted to learn about them, and, as usual with my personal projects, there is a secondary reason for writing it – to play around with C++ and see what I can learn.

In order to have a renderer you need to have a scene to render, which means you also need to have some data structures describing that scene. For the time being my data structure is simple – a list of polygons. Each polygon object includes various useful bits of data – things like colour and reflectivity. A polygon also has a vector of vertices. The vertices define the bounds of the polygon.

In addition to the polygons and vertices I need to process edges. An edge is defined by the two vertices it connects. My canonical data structure is the polygons and vertices, so when I need edges I need to construct them from the vertices.

There is nothing too difficult about that – iterate through the vertices and create an edge using each vertex and the following vertex. The twist comes when we get to the end of the vertices. Once we are at the last vertex, the next vertex we want is actually at the beginning of the vector. The polygon is closed so there has to be an edge between the last vertex and the first vertex. I am going to refer to this concept as “adjacent circular” – we already have algorithms like std::adjacent_find and “circular” indicates that we are linking the end to the beginning.

For the sake of simplicity I’ll get rid of the extraneous details and slim things down so that our input is std::vector< char >, and our output is std::vector< std::pair< char, char > >. In other words, we are trying to write this function:

std::vector< std::pair< char, char > >
getEdges( std::vector< char > const& vertices );

(As usual, some using statements and typedefs would make this less verbose, and, as usual I want to be explicit about where everything is coming from so I am going to spell out every component in full. Also, when I get to the point of writing classes I am going to inline the function definitions – in reality I’d split them out from the declaration).

If the input is:

a, b, c, d, e

we expect the output of getEdges to be:

(a b), (b c), (c d), (d e), (e a)

It’s always good to start off with the straightforward solution:

std::vector< std::pair< char, char > >
getEdges( std::vector< char > const& vertices)
{
    std::vector< std::pair< char, char > > edges;

    std::vector< char >::const_iterator i( std::begin( vertices ) );
    std::vector< char >::const_iterator j( std::begin( vertices ) );

    while( i != std::end( vertices ) )
    {
        ++j;
        
        if( j == std::end( vertices ) )
        {
            j = std::begin( vertices );
        }

        edges.push_back( std::make_pair( *i, *j ) );

        ++i;
    }

    return edges;
}

The code is not particularly elegant or efficient but it does the job. At the very least it’s useful as a test baseline for the other solutions we’re going to come up with.

Let’s follow Sean Parent’s dictum of “no raw loops” and see what else we can come up with. The point of “no raw loops” is not that there are no loops at all, just that the loops themselves are encapsulated in a reusable algorithm.

Let’s write ourselves an algorithm:

template<
    typename ForwardIterator,
    typename BinaryOperation >
BinaryOperation for_adjacent_circular(
    ForwardIterator first,
    ForwardIterator last,
    BinaryOperation op)
{
    ForwardIterator i( first );
    ForwardIterator j( first );
    
    while( i != last )
    {
        ++j;
        
        if( j == last )
        {
            j = first;
        }
        
        op( *i, *j );

        ++i;
    }

    return op;
}

As we’d expect, for_adjacent_circular looks very much like our original function. When we use it, it looks very much like std::for_each:

std::vector< std::pair< char, char > >
getEdgesAlgorithm( std::vector< char > const& vertices)
{
    std::vector< std::pair< char, char > > edges;

    for_adjacent_circular(
        std::begin( vertices ),
        std::end( vertices ),
        [ &edges ]( char a, char b )
        {
            edges.push_back( std::make_pair( a, b ) );
        } );

    return edges;
}

This is good as far as it goes – the problem is that it doesn’t go far enough. We wrote an algorithm that is specialized for our original problem – converting a vector of vertices into a vector of edges. Given that it’s a for_each style algorithm we can fake it to do a lot of things that we’d like to do, but it feels like we’re abusing the algorithm and falling into the trap John Potter describes of making std::for_each the “goto of algorithms”.

There are many more algorithms we might want to use – it would be nicer to have iterators that we could feed into one of the existing standard algorithms. Here’s my first pass at an iterator. I’ll display the whole thing first then go through the interesting features of it.

template< typename Iterator >
class ConstAdjacentCircularIterator
{
public:
    ConstAdjacentCircularIterator( 
        Iterator begin, 
        Iterator position, 
        Iterator end )
    : begin_( begin )
    , position_( position )
    , end_( end )
    {
    }

    typedef std::forward_iterator_tag iterator_category;
    typedef std::pair< 
        typename Iterator::value_type const&, 
        typename Iterator::value_type const& > value_type;
    typedef typename Iterator::difference_type difference_type;
    typedef value_type* pointer;
    typedef value_type& reference;

    // Prefix
    ConstAdjacentCircularIterator& operator ++()
    {
        ++position_;

        return *this;
    }

    // Postfix
    ConstAdjacentCircularIterator operator ++( int )
    {
        ConstAdjacentCircularIterator original( *this );
        ++(*this);
        return original;
    }

    value_type operator *() const
    {
        Iterator next( position_ );
        ++next;
        if( next == end_ )
        {
            next = begin_;
        }
        
        return value_type( *position_, *next );
    }

private:

    template< typename I >
    friend bool operator ==( 
        ConstAdjacentCircularIterator< I > const& a, 
        ConstAdjacentCircularIterator< I > const& b )
    {
        return 
            a.begin_ == b.begin_ &&
            a.position_ == b.position_ &&
            a.end_ == b.end_;
    }

    Iterator begin_;
    Iterator position_;
    Iterator end_;
};

template< typename Iterator >
bool operator !=( 
    ConstAdjacentCircularIterator< Iterator > const& a, 
    ConstAdjacentCircularIterator< Iterator > const& b )
{
    return !( a == b );
}

(As usual I needed to break out my copy of C++ Templates by Vandervoorde and Josuttis to get the friend declarations correct. Actually I am still not certain they are correct – I got different errors out of two different compilers and spent some time getting to what might be the right solution. I like C++ and I like templates but the syntax often has me baffled).

I have written the const version of the iterator since that’s what I need for the moment. We’ll look at the non-const version later.

ConstAdjacentCircularIterator( 
    Iterator begin, 
    Iterator position, 
    Iterator end )
: begin_( begin )
, position_( position )
, end_( end )
{
}

The constructor for our adjacent circular iterator takes other iterators as input. I assume we have some kind of underlying sequence which has its own iterators. Interestingly, we have to specify three iterators to construct an adjacent circular iterator. Not only do we specify position, we also need to specify the range of the underlying sequence with begin and end. We’ll see why these are necessary when we implement the increment functions. This is a pattern I have come across before. When I’m writing iterators that do more than just directly reference the elements of an underlying sequence I often need to specify more information about the range that the iterator is iterating over.

Moving on, we have the typedefs that all iterators must have:

typedef std::forward_iterator_tag iterator_category;

I have chosen to make this a forward iterator simply because I didn’t want to implement more than operator ++. There is no reason why we can’t implement all of the other access methods and make the circular iterator as fully featured as the underlying iterator.

typedef std::pair< 
    typename Iterator::value_type const&, 
    typename Iterator::value_type const& > value_type;

value_type is interesting, and by “interesting” I mean “the first sign that the train is eventually going to come off the rails”.

Our basic problem is that the value we want (a pair of the values stored in the underlying container) is not actually what we have stored in the underlying container. To try and work around this I am storing references to the two values that make up our pair. This looks like a decent compromise – it is a decent compromise – but it’s still a compromise.

This is going to come up again later.

typedef typename Iterator::difference_type difference_type;

We take difference_type straight from our underlying iterator – nothing surprising here.

typedef value_type* pointer;

typedef value_type& reference;

pointer and reference should not be surprising. They’re just a pointer and reference to value_type. However, we are already uneasy about value_type so we need to keep an eye on these.

// Prefix
ConstAdjacentCircularIterator& operator ++()
{
    ++position_;

    return *this;
}

Our prefix increment implementation is not surprising. I am not implementing any sort of error checking – I assume that will be handled (if it’s handled at all) by the underlying iterator. Notice that the iterator itself is not wrapping around the container – we only wrap around when we defererence the iterator. Incrementing the iterator will never take us back to the beginning of the container. While it would be possible to create an iterator that wrapped around (although the question of what end would be is an interesting one), that isn’t what we’re doing here.

// Postfix
ConstAdjacentCircularIterator operator ++( int )
{
    ConstAdjacentCircularIterator original( *this );
    ++(*this);
    return original;
}

Postfix increment is just what we’d expect.

value_type operator *() const
{
    Iterator next( position_ );
    ++next;
    if( next == end_ )
    {
        next = begin_;
    }
    
    return value_type( *position_, *next );
}

The deference function is where the magic happens. Let’s get the easy stuff out of the way – we use the current iterator and next iterator, making sure that if we’re at the end of the range we wrap around to the beginning. This is why we need to have begin and end stored in the iterator itself – we can’t implement this function without them.

We dereference position_ and next which gives us references to the underlying values, then we construct a value_type from the two references and return it.

Now let’s look at the subtleties. Our problems with value_type really bite us here. operator * should return a reference to the value type – the standard says so (section 24.2.2 in the C++11 standard). We can’t return a reference to value_type because our underlying container (whatever that is) does not contain objects of that type.

value_type contains references to the underlying objects, but that isn’t the same thing and is not enough to satisfy the standard.

template< typename I >
friend bool operator ==( 
    ConstAdjacentCircularIterator< I > const& a, 
    ConstAdjacentCircularIterator< I > const& b )
{
    return 
        a.begin_ == b.begin_ &&
        a.position_ == b.position_ &&
        a.end_ == b.end_;
}

The equality operator is necessary for us to do almost anything useful with our iterator. There is one interesting point here. There are two possible ways of writing this function. We could just compare position_, however if we do that we can generate situations like this:

ConstAdjacentCircularIterator< 
    typename std::vector< char >::const_iterator > i0( 
        vc.begin(), vc.begin() + 2, vc.end() );

ConstAdjacentCircularIterator< 
    typename std::vector< char >::const_iterator > i1( 
        vc.begin(), vc.begin() + 2, vc.begin() + 3 );

i0 and i1 have the same position – vc.begin() + 2. If we were just comparing position, i0 == i1 is true. Unfortunately, *i0 == *i1 is false.

*i0 is (c, d)

*i1 is (c, a)

It is a normal restriction for operator == to be valid only if the iterators refer to the same underlying sequence (C++11 standard section 24.2.5). Even though i0 and i1 are based off the same underlying container, they are not referring to the same sequence since they have different end values.

All of that is a long-winded way of saying that we need to compare begin_, position_, and end_.

template< typename Iterator >
bool operator !=( 
    ConstAdjacentCircularIterator< Iterator > const& a, 
    ConstAdjacentCircularIterator< Iterator > const& b )
{
    return !( a == b );
}

I define inequality in terms of equality – that’s a common technique. Since this doesn’t need to be a member function it isn’t.

Our getEdges function is now simple (at least it would be if I were using typedefs):

template< typename T >
std::vector< std::pair< typename T::value_type, typename T::value_type > >
getEdgesIterator( T const& vertices)
{
    std::vector< std::pair< 
        typename T::value_type, 
        typename T::value_type > > edges;

    ConstAdjacentCircularIterator< typename T::const_iterator > b( 
        std::begin( vertices ), 
        std::begin( vertices ), 
        std::end( vertices ) );
    ConstAdjacentCircularIterator< typename T::const_iterator > e( 
        std::begin( vertices ), 
        std::end( vertices ), 
        std::end( vertices ) );

    std::copy( b, e, std::back_inserter( edges ) );

    return edges;
}

Even better, since we now have iterators it looks like we can use any standard algorithm that requires iterators – std::for_each, std::transform, std::find etc.

I will award bonus points to anyone who spotted the weasel words in that statement – “looks like”. That’s an important caveat and we’re going to come back to it later. For the time being let’s just say that ConstAdjacentCircularIterator is doing a reasonable impersonation of an iterator.

So much for the const version of the iterator, what about the non-const version? There’s one difference – the type of value_type:

typedef std::pair< 
    typename Iterator::value_type&, 
    typename Iterator::value_type& > value_type;

This time the references to the underlying data type are non-const. We can write to an AdjacentCircularIterator as well as read from it.

For example, if we run this code (I have output operators already defined for std::vector< char > and std::pair< char, char >):

std::vector< char > vc( { 'a', 'b', 'c', 'd', 'e' } );

AdjacentCircularIterator< 
    typename std::vector< char >::iterator > i( 
        vc.begin(), 
        vc.begin(), 
        vc.end() );

++i;

std::cout << vc << "\n";
std::cout << *i << "\n";

*i = std::make_pair( 'x', 'y' );

std::cout << vc << "\n";
std::cout << *i << "\n";

We get this output:

a, b, c, d, e
(b c)
a, x, y, d, e
(x y)

What about range based for? To make range-based for work we need to create a container having begin and end methods. Since we've already defined our iterator classes this is easy:

class AdjacentCircularContainer
{
public:

    typedef AdjacentCircularIterator< 
        std::vector< char >::iterator >::value_type value_type;

    typedef value_type& reference;
    typedef value_type const& const_reference;
    typedef value_type* pointer;
    typedef value_type const* const_pointer;

    typedef AdjacentCircularIterator< 
            typename std::vector< char >::iterator > iterator;
    typedef ConstAdjacentCircularIterator< 
            typename std::vector< char >::const_iterator > const_iterator;
    
    AdjacentCircularContainer( std::vector< char >* p )
    : pContainer_( p )
    {
    }

    const_iterator begin() const
    {
        return const_iterator( 
                pContainer_->begin(), 
                pContainer_->begin(), 
                pContainer_->end() );
    }

    const_iterator end() const
    {
        return const_iterator( 
            pContainer_->begin(), 
            pContainer_->end(), 
            pContainer_->end() );
    }
    
private:
    std::vector< char > *pContainer_;
};

(I am implementing the minimum necessary here - there's more we can, and should, add if we want this to be a full blown container).

We can use our container exactly as we'd expect:

template< typename T >
std::vector< std::pair< typename T::value_type, typename T::value_type > >
getEdgesContainer( T const& vertices)
{
    std::vector< std::pair< char, char > > result;

    AdjacentCircularContainer c( &v_char );

    for( std::pair< char, char > e : c )
    {
        result.push_back( e );
    }

    return result;
}

While we're at it, now we have our container we can start to use the Adobe Source Library algorithms:

template< typename T >
std::vector< std::pair< typename T::value_type, typename T::value_type > >
getEdgesAdobe( T const& vertices)
{
    std::vector< std::pair< char, char > > result;

    AdjacentCircularContainer c( &v_char );

    adobe::copy( c, std::back_inserter( result ) );

    return result;
}

Life is good. We have iterators and we have containers that implement our desired "circular adjacent" functionality.

Except that we don't.

I have dropped unsubtle hints along the way that we've been cheating. While we have things that do reasonable impersonations of iterators and containers (good enough to do the things I need for my current use case), we do not have standard compliant iterators and containers.

The fundamental problem is the one we identified way back when we started to come up with the type of value_type. Normally with a C++ sequence there are objects of type value_type stored somewhere, and they persist for long enough that returning and storing references to them is a sensible thing to do. We don't have that. The only time we ever have an actual pair of values is as a local value inside the dereferencing function, and, as we all know, passing back a reference to a local value (which goes out of scope as soon as the function ends) is a Bad Thing.

We want (and the standard requires) the declaration of operator * to be:

value_type const& operator *() const

but we have to make it:

value_type operator *() const

There are other problems (we can't implement front or back correctly), but they all stem from the same source.

If you're thinking - "this sounds like the problems with std::vector< bool >" - you're right. Because std::vector< bool > is allowed to be specialized so that the boolean values can be stored as a bitset you can't actually get a reference to an individual bool object because the container doesn't store individual bool objects. We can fake it with proxy objects, and more modern versions of C++ can fake it better than older versions can, but we're still faking it, and sooner or later the faking will catch up with us.

For more on std::vector< bool > checkout Howard Hinnant's comments on this Stack Overflow question and this blog post.

In conclusion, it's been an interesting investigation but is it really worth it? For my particular use case the answer is "no" (at least in terms of solving my original "vertices to edges" problem rather than writing a blog post and learning something). Our original simple solution is good enough for my needs. I have it wrapped up in the function getEdges so it's easy enough to swap out if I need something different.

Once I have my edges in a container I can run whatever algorithsm I need on a container that does actually contain edges rather than some fake sequence that just pretends to contain edges.

Random programming

Random programming

Bob Archer's thoughts about programming

Month: September 2016

Adjacent circular iterators