Generic algorithms for 2D spatial fields II: Data type

In the previous post we had a quick look at generic algorithms, spatial fields and the requirements of generic algorithms for 2D spatial fields. In this post we are going to tackle the first requirement:

It must be possible for the user to pick the data type and value type of the input and output arguments. The data type does not need to be picked from a fixed set of types. The user must be able to use her own types.

This requirement is the easiest to fulfill. By turning our previous function into a function template, we can parameterize the function by the data types of the arguments and the result.

#pragma once
#include <cmath>


template<
    typename Base,
    typename Exponent,
    typename Result>
inline void pow(
    Base const& base,
    Exponent const& exponent,
    Result& result)
{
    size_t const nr_rows{base.shape()[0]};
    size_t const nr_cols{base.shape()[1]};

    for(size_t r = 0; r < nr_rows; ++r) {
        for(size_t c = 0; c < nr_cols; ++c) {
            result[r] = std::pow(base[r], exponent);
        }
    }
}

This algorithm can be called like this:

#include <cstddef>
#include "fern/feature/core/raster.h"
#include "pow.h"


int main(
    int /* argc */,
    char** /* argv */)
{
    using Raster = fern::Raster<double, 2>;

    size_t const nr_rows{2};
    size_t const nr_cols{3};
    double west{0.0};
    double north{0.0};
    double cell_size{10.0};
    Raster::Transformation transformation{{west, cell_size, north, cell_size}};
    Raster base(fern::extents[nr_rows][nr_cols], transformation);

    // Assign values to the raster.
    // ...

    double exponent{2};

    Raster result(fern::extents[nr_rows][nr_cols], transformation);

    pow(base, exponent, result);

    return EXIT_SUCCESS;
}

Some observations:

  • The result is now passed as a writable argument to the function. The compiler instantiates a function for each unique set of template arguments that we provide. By turning the result into an argument, these instantiations are involved in overload resolution allowing the compiler to pick the best match. This is relevant if we call the algorithm for the same Base and Exponent types, but different Result types. If we had returned the result the compiler would have thrown an error complaining about ambiguous overloads. It cannot choose between two functions with the same argument types, but different return types.
  • Although we turned the argument and result data types into template arguments, we are still limiting our users to the fern::Raster interface. Only raster types adhering to this interface can be passed. Ideally, we want to decouple the algorithm from a specific data type’s interface, allowing the user to use any raster-like type, whatever its interface.
  • Although Base is a raster-like type, Exponent must still be a floating point type. We would like to also be able to pass a raster-like type for the exponent argument.

Let’s tackle these limitations in turn. A technique to decouple an algorithm from its argument’s interfaces is to use customization points. And we can use a technique called tag dispatching to select an algorithm’s implementation based on properties of the template’s argument types.

Customization points

Let us forget about a single specific class for representing rasters and focus on the minimal set of requirements that the pow algorithm poses on the arguments passed in. Given that the algorithm currently takes a raster as its first argument and a number as its second, we can define the following requirements:

  • It must be possible to determine the number of rows and columns in the base argument raster.
  • It must be possible to iterate over all raster cells and obtain a readable value of each cell from the base argument raster and a writable reference of each cell in the result raster.

Customization points are free function templates that return a property of an argument and which can be overloaded for different argument types. One example from the C++ Standard Library is std::begin, which returns an iterator to the first element in a collection. This function is overloaded for the various containers in the STL and for C-style arrays. Using std::begin to obtain such an iterator is a better option than calling my_container.begin() because the latter assumes my_container‘s type provides this function. C-style arrays don’t, for example.

So, customization points help to write code that does not depend on a specific type’s interface. Here is a declaration of a customization point for getting the size of a raster:

template<
    typename Raster>
size_t size(Raster const& raster, size_t dimension);

It takes a raster and an id of a dimension. In the case of 2D rasters, this latter argument must be a 0 or a 1, where 0 represents the first dimension in a 2D array and 1 the second. Since C-style arrays have a row-major ordering, the first dimension iterates over the rows and the second over the columns.

The Raster template argument has no relation with the fern::Raster class template used earlier. We can implement the customization point for our fern::Raster class template as follows, using a partial specialization:

template<
    typename T,
    size_t nr_dimensions>
size_t size(
   fern::Raster<T, nr_dimensions> const& raster,
   size_t dimension)
{
   return raster.shape()[dimension];
}

Within the customization point, we can use our type specific interface. For raster types with other interfaces we can implement alternative overloads. In our algorithm we can now replace the code obtaining the size of the dimensions to be calls to our size customization point.

We need two more customization points: one for obtaining a readable value of a specific cell and one for obtaining a writable reference to a value of a specific cell. Let’s call them both get:

template<
    typename Raster>
value_type<Raster> const& get(Raster const& raster, size_t row, size_t col);

template<
    typename Raster>
value_type<Raster>& get(Raster& raster, size_t row, size_t col);

The return type is calculated by the value_type alias template, which returns the value type of the template argument. Its definition assumes that type traits (see below) provide the value type of every data type:

template<
    typename T>
using value_type = typename TypeTraits<T>::value_type;

For our fern::Raster class template we can implement the additional customization points as follows:

template<
    typename T,
    size_t nr_dimensions>
value_type<fern::Raster<T, nr_dimensions>> const& get(
    fern::Raster<T, nr_dimensions> const& raster,
    size_t row,
    size_t col)
{
    return raster[row][col];
}

template<
    typename T,
    size_t nr_dimensions>
value_type<fern::Raster<T, nr_dimensions>>& get(
    fern::Raster<T, nr_dimensions>& raster,
    size_t row,
    size_t col)
{
    return raster[row][col];
}

The last missing piece is the implementation of the TypeTraits class template. Type traits provide properties of a type. In our case, we need the value type of the Raster class template.

template<
    typename T>
struct TypeTraits
{
};
template<
    typename T,
    size_t nr_dimensions>
struct TypeTraits<
    fern::Raster<T, nr_dimensions>>
{
    using value_type = T;
};

Let’s now rewrite our algorithm in terms of customization points:

#pragma once
#include <cmath>
#include "customization_point.h"


template<
    typename Base,
    typename Exponent,
    typename Result>
inline void pow(
    Base const& base,
    Exponent const& exponent,
    Result& result)
{
    size_t const nr_rows{size(base, 0)};
    size_t const nr_cols{size(base, 1)};

    for(size_t r = 0; r < nr_rows; ++r) {
        for(size_t c = 0; c < nr_cols; ++c) {
            get(result, r, c) = std::pow(get(base, r, c), exponent);
        }
    }
}

Note that the algorithm is written in terms of template parameters and customization points, all to be provided by the caller. We abstracted away any reference to a specific raster type. With this in place, we can now call this algorithm with every raster-like base argument for which we have implemented the three customization points. In the next section we are going to get rid of the limitation that the base argument must be a raster and the exponent a number.

Tag dispatching

If we call our current pow algorithm with a raster instance as the exponent argument, the compilation fails. Our algorithm expects the exponent to be a single number. Passing a raster requires a different implementation. The problem we need to solve is how we can pass different permutations of raster and number arguments to pow, without having to revert to differently named functions like pow_raster_number and pow_raster_raster. Somehow, our pow implementation needs to do something different depending on its template argument types.

Tag dispatching can help us here. What we need is a tag for each data type we support and call a second function that actually implements the algorithm for each unique combination of argument types. Let us first define the tags:

struct number_tag {};
struct raster_tag {};

Now we need a way to obtain the tag given a data type. We can extent our traits class for that:

template<
    typename T>
struct TypeTraits
{
    using tag = number_tag;  // Default.
};


template<
    typename T>
using tag = typename TypeTraits<T>::tag;
template<
    typename T,
    size_t nr_dimensions>
struct TypeTraits<
    fern::Raster<T, nr_dimensions>>
{
    using value_type = T;  // Like before.
    using tag = raster_tag;
};

In our original pow implementation we can now dispatch on data type tag to an algorithm with the correct implementation. Note that this all happens during the compilation, and not at runtime.

#pragma once
#include <cmath>
#include "customization_point.h"


namespace detail {

    template<
        typename Base,
        typename Exponent,
        typename Result>
    inline void pow(
        Base const& base,
        Exponent const& exponent,
        Result& result,
        raster_tag,
        number_tag)
    {
        size_t const nr_rows{size(base, 0)};
        size_t const nr_cols{size(base, 1)};

        for(size_t r = 0; r < nr_rows; ++r) {
            for(size_t c = 0; c < nr_cols; ++c) {
                get(result, r, c) = std::pow(get(base, r, c), exponent);
            }
        }
    }

    template<
        typename Base,
        typename Exponent,
        typename Result>
    inline void pow(
        Base const& base,
        Exponent const& exponent,
        Result& result,
        raster_tag,
        raster_tag)
    {
        size_t const nr_rows{size(base, 0)};
        size_t const nr_cols{size(base, 1)};

        for(size_t r = 0; r < nr_rows; ++r) {
            for(size_t c = 0; c < nr_cols; ++c) {
                get(result, r, c) = std::pow(get(base, r, c),
                    get(exponent, r, c));
            }
        }
    }

}  // namespace detail


template<
    typename Base,
    typename Exponent,
    typename Result>
inline void pow(
    Base const& base,
    Exponent const& exponent,
    Result& result)
{
    using base_tag = tag<Base>;
    using exponent_tag = tag<Exponent>;

    detail::pow(base, exponent, result, base_tag{}, exponent_tag{});
}

More overloads of detail::pow can be implemented if necessary.

With these changes we made it possible for the user to use our algorithm with her own data types. In the next post we are going to take a look at handling invalid data elements.

Generic algorithms for 2D spatial fields I: Introduction

This is the first in a series of blog posts about developing generic algorithms in C++ for handling 2D spatial fields. The goal of this series is to explain how we can generalize algorithms handling spatial fields and why that might be useful.

Generic algorithms

Generic algorithms in C++ are templated algorithms that are written in terms of template argument types that are defined later, when the algorithms are instantiated. The C++ Standard Template Library (STL) contains many examples of generic algorithms.

To appreciate the benefits of generic algorithms over non-generic algorithms, we can first take a look at a non-generic algorithm and compare it to a generic one. Let’s define an algorithm for finding the maximum element in a collection:

#include <vector>

std::vector<int>::iterator max_element(
    std::vector<int>::iterator first,
    std::vector<int>::iterator last)
{
    std::vector<int>::iterator it = first;

    for(++first; first != last; ++first) {
        if(*first > *it) {
            it = first;
        }
    }

    return it;
}

The algorithm accepts iterators pointing to the first element and end position of the collection. It returns an iterator pointing to the first element with the greatest value, or the end position if the collection is empty.

This algorithm works fine, but only if we want to find the maximum element in a vector of integers. When this is not the case, we end up writing multiple algorithms for multiple argument types. That is the kind of work computers are better at than we. Fortunately, we can ask the compiler to generate such a family of algorithms for us, by using function templates. The generic version of max_element is part of the STL (std::max_element) and has the following prototype (we skip the overload accepting a user-defined comparison function):

template<
    typename ForwardIt>
ForwardIt max_element(ForwardIt first, ForwardIt last);

When we call this generic algorithm like this:

std::vector<int> my_values{1, 5, 2, 4, 3};
auto it = std::max_element(my_values.begin(), my_values.end());

the compiler will instantiate a similar algorithm for us as the non-generic version which we wrote above. But the generic algorithm can also be called with other argument types:

std::list<std::string> my_labels{"amsterdam", "rotterdam", "the hague"};
auto it = std::max_element(my_labels.begin(), my_labels.end());

So, in short, generic algorithms are templates for actual algorithms, with some aspects provided by the caller. The compiler will substitute the missing aspects (the template parameters) with the information provided at the call site and instantiate and compile the final algorithm. Using the generic algorithm in a different context involves calling it with different template argument types.

A generic algorithm cannot be compiled without the caller providing essential information that the algorithm is parameterized on.

One reason for developing generic algorithms is that the person developing the algorithms doesn’t have all the pieces of information required to be able to implement the final algorithm. An important task when designing generic algorithms is to figure out what the pieces of information are that users will want to vary. These will end up as template parameters of the generic algorithms.

Spatial fields

To limit ourselves a bit to data that we know most about we will now turn to spatial fields. The term field is overloaded and has many different meanings. In our case we use the term in the spatial sense. A field is a quantity that (at least in theory) has a value at any point in space. This value varies continuously through space. Here we limit ourselves to 2D fields. Examples of such fields are the atmospheric pressure at a certain height above the earth’s surface, and the elevation of the earth’s surface itself (both projected on a 2D Cartesian plane). To be able to conveniently handle such quantities in the computer, we discretize fields using rasters, which are essentially 2D arrays with some additional information about where each raster is positioned according to some coordinate reference system, and the width and height of the raster cells. For our purposes here it is enough to think of spatial fields as 2D arrays positioned somewhere.

field_raster

Next, we will turn to the requirements of our generic algorithms.

Requirements

In this section we will develop a first version of a generic algorithm handling 2D spatial field data. It is good practice to start with a non-generic version, and get a clear idea about the shortcomings, before writing generic code.

For our purposes here we are not interested in the calculations themselves. In fact we are interested in everything but the calculations themselves. The algorithm we will develop will simply raise each cell in a raster to the power of some exponent. Here is a first try:

#include <cmath>
#include "fern/feature/core/raster.h"

fern::Raster<double, 2> pow(
    fern::Raster<double, 2> const& base,
    double exponent)
{
    size_t const nr_rows{base.shape()[0]};
    size_t const nr_cols{base.shape()[1]};
    fern::Raster<double, 2> result(fern::extents[nr_rows][nr_cols],
        base.transformation());

    for(size_t r = 0; r < nr_rows; ++r) {
        for(size_t c = 0; c < nr_cols; ++c) {
            result[r] = std::pow(base[r], exponent);
        }
    }

    return result;
}

We use the Raster class template from Geoneric‘s Fern library, but this could be any class with which we can model a raster. The details of this class’ implementation don’t matter here.

Some observations and drawbacks of this implementation:

  • The type of the individual elements (further on we call this the value type) in the base raster are fixed to double. This is limiting the caller. She might prefer to pass in float or long double values. The same is true for exponent.
  • The type of the base argument (the data type) is fixed to Raster<double, 2>. This is also limiting the caller. She has to use our Raster type while she may have a type for modelling rasters that fits her needs better.
  • The exponent is fixed to a double, but we may want to be able to pass a raster instead of a single number. When we pass a raster for exponent, we expect each cell in base to be raised by the corresponding cell in exponent.
  • The result is returned, which is OK if Raster supports move semantics. Otherwise it is definitely not OK. We don’t want to copy possibly very large rasters. Furthermore, in C++ return types are not considered in overload resolution which limits our possibilities to make the return type configurable by the caller. This will be explained in more detail in the next post.
  • Often, algorithms have a limited domain of valid input values. For example, a negative finite base and a non-integer finite exponent are considered out of domain by the std::pow algorithm. What should happen if an algorithm’s input argument contains values that are not part of the algorithm’s valid domain of input values?
  • Similarly, output value types have a limited range of values they can represent. For example, an int8_t cannot hold values larger than $latex \frac{2^8}{2} – 1$. What should happen if an algorithm calculates values that are not part of the valid range of values?
  • In the algorithm above, we perform a calculation for each cell. In reality, rasters often contain cells with undefined values. Depending on the field that is represented by the raster, this may be due to clouds, measurement errors, errors in previous calculations, etc. We clearly need a way to test whether or not a cell from an input raster contains a valid value or not.
  • Given the previous observations, we also need a way to mark that an output raster cell does not contain a valid value. If an input raster cell contains an invalid value we want to be able to mark the corresponding cell in the output raster as invalid too. Also, if we are able to detect out-of-domain and out-of-range errors, we want to be able to mark the corresponding output raster cells as invalid.

Given the non-generic algorithm and the observations, we are now better equipped to list the requirements of a generic algorithm handling spatial fields:

  • It must be possible for the user to pick the data type and value type of the input and output arguments. The data type does not need to be picked from a fixed set of types. The user must be able to use her own types.
  • It must be possible for the user to ‘explain’ to the algorithm how to:
    • detect invalid input raster cell values
    • mark invalid output raster cell values
    • handle out of domain input values
    • handle out of range output values
  • However the above requirements are met, the performance of the algorithm must be equal to a handwritten non-generic algorithm.

In the next posts, we are going to tackle each of these requirements in turn.