HPX: Parallel and distributed applications of any scale

I recently found out about a software library called HPX that supports writing parallel and distributed applications of any scale. It allows the user of the library (a software developer) to focus more on the tasks that need to be performed by the software and less about how these tasks can be executed efficiently given the hardware resources available. Ideally, using HPX the developer can write software that performs optimal on any hardware platform, ranging from (relatively) low-end platforms as laptops to high-end platforms like clusters and supercomputers.

More information about HPX can be found at the HPX homepage. The concepts behind the library are explained in a talk by Hartmut Kaiser (one of the people behind HPX) called “Asynchronous Computation in C++“.

Generic algorithms for 2D spatial fields III: No-data

In the first post in this series we had a quick look at generic algorithms, spatial fields and the requirements of generic algorithms for 2D spatial fields. In this post we are going to tackle the first part of the second requirement:

It must be possible for the user to ‘explain’ to the algorithm how to:

  • detect invalid input raster cell values
  • mark invalid output raster cell values

The algorithm we developed until now assumes all input argument raster cells contain valid data. Here is the core of the algorithm raising a raster of values by a single value:

for(size_t r = 0; r < nr_rows; ++r) {
    for(size_t c = 0; c < nr_cols; ++c) {
        get(result, r, c) = std::pow(get(base, r, c), exponent);
    }
}

As mentioned in the introduction, we often have to take invalid values into account. Ideally, we want to be able to tell the algorithm whether or not our argument rasters may contain invalid cells or not, and if so, how to test each individual cell’s value for validity.

Similarly, instead of being able to test input raster cells for validity, we also want to be able to mark result raster cells as invalid, for example in case one of the argument rasters contains invalid cells, or contains values that are considered out of the algorithm’s domain.

Policy classes are a technique that can help us with these requirements. Policy classes allow us to parameterize behavior of an algorithm, which enables the caller to tune the algorithm to her needs. An example of a policy from the C++ Standard Library is Allocator, which allows the caller to tune the allocation and deallocation of memory in code accepting an allocator policy (most containers from the Standard Library do, see for example the second template parameter of the std::vector class template).

Our algorithm needs to be parameterized with policy classes that encapsulate the behavior of how to deal with no-data values in the input and output rasters. We must then rewrite our algorithm to use these policies. The caller of our algorithm can then decide if and how input no-data must be tested and if and how output no-data must be marked. Depending on the raster type used, no-data may be marked by a special marker value, like the maximum or minimum representable value by the value type, or maybe by a layered boolean raster or a bit mask. It all depends on the type passed in the algorithm. As long as the algorithm has a way to test and mark no-data elements, it can do its job.

Let’s think about the interface of the no-data checking and marking policies. For maximum flexibility, we decided to separate the no-data checking and marking policies, which we call the input no-data policies, and output no-data policies respectively. Both kinds of no-data policies need to be able to either test or mark individual elements for validity. In the current algorithm, individual raster cells are addressed using row and column indices, which we can use for our no-data policies too. So, an input no-data policy needs to be able to tell whether or not an element with a certain row and column index contains valid data or not:

bool MyInputNoDataPolicy::is_no_data(size_t row, size_t col);

Similarly, an output no-data policy needs to be able to mark an element as valid or invalid:

bool MyOutputNoDataPolicy::mark_as_no_data(size_t row, size_t col);

We are not passing the raster itself, because we have no idea how the policy is implemented. It may use the raster, or not. This means that we must pass policy instances to the algorithm, the types alone are not enough. The policy instance must be self-contained and be able to answer our requests, only given an element’s address.

Here is an updated version of the part of our algorithm that raises a raster by a number:

template<
    typename InputNoDataPolicy,
    typename OutputNoDataPolicy,
    typename Base,
    typename Exponent,
    typename Result>
void pow(
    InputNoDataPolicy const& input_no_data_policy,
    OutputNoDataPolicy& output_no_data_policy,
    Base const& base,
    Exponent const& exponent,
    Result& result,
    raster_tag,
    number_tag)
{
    size_t const nr_rows{size(base, 0)};
    size_t const nr_cols{size(base, 1)};

    for(size_t r = 0; r < nr_rows; ++r) {
        for(size_t c = 0; c < nr_cols; ++c) {
            if(input_no_data_policy.is_no_data(r, c)) {
                output_no_data_policy.mark_as_no_data(r, c);
            }
            else {
                get(result, r, c) = std::pow(get(base, r, c), exponent);
            }
        }
    }
}

We are using the input no-data policy to test individual raster elements for validity. If an element is not valid, the output no-data policy is used to mark a no-data element in the result raster. This works fine in this specific case, but we can still do better.

In general, algorithms may accept zero or more arguments, which all may have a different way to mark no-data cells (if at all!). So, instead of a single input no-data policy we need an input no-data policy for every individual input argument. And in the algorithms we need to test input argument elements using the corresponding, argument-specific, policy. That way, the caller can pass the appropriate input no-data policies, given the specific arguments passed into the algorithm.

Generalizing further, being able to test and mark validity is as relevant for scalar values as for rasters containing values. For example, when calculating the sum of the values in a raster, the result may be larger than the type used to represent the (scalar) result can hold (out of range error). In that case, the caller may want the algorithm to mark the result as being invalid. So, we need to be able to test for invalid elements in all arguments, whatever their types, and we need to be able to mark no-data in all results, whatever their types.

In our algorithm, we likely use code like this (again, assuming a raster base and number exponent:

if(std::get<0>(input_no_data_policy).is_no_data(r, c) ||
        std::get<1>(input_no_data_policy).is_no_data()) {
    output_no_data_policy.mark_as_no_data(r, c);
}
else {
    get(result, r, c) = std::pow(get(base, r, c), exponent);
}

In words: if the current base raster cell is not valid or if the exponent is not valid, then mark the current result raster cell as not valid. In all other cases, calculate a result given the valid input values. The input no-data policy passed in here, is an std::tuple of argument-specific input no-data policies.

Supporting testing and marking no-data elements for all data types, involves implementing our get customization point for all data types, including scalars. Imagine some type MaskedValue, which supports keeping track of a scalar and whether it is valid or not. It probably has methods to obtain the value and to test whether or not this value is valid. Passing such an instance to pow will result in compile errors. We need to pass the value contained in the instance. For MaskedValue, the get customization point would look something like this:

template<
    typename T>
inline typename DataTraits<MaskedValue<T>>::const_reference get(
    MaskedValue<T> const& constant)
{
    return constant.value();
}


template<
    typename T>
inline typename DataTraits<MaskedValue<T>>::reference get(
    MaskedValue<T>& constant)
{
    return constant.value();
}

In the case of simple scalar types, we need to define overloads of these customization points that just return the scalar value itself.

We are adding quite some code to our simple algorithm. Aren’t we adding logic for worst-case scenarios that may not be applicable to most use cases? For example, the exponent number may be a constant hard coded by the caller, like a 2 for squaring the base. This exponent will never be invalid and we don’t want the algorithm to spend any CPU cycle on verifying this. For those circumstances it is useful to have dummy policies that contain code that an optimizing compiler will remove from the instantiated algorithm during code generation. Here is a method from a SkipNoData input no-data policy:

class SkipNoData
{
public:

    // ...

    static constexpr bool is_no_data(size_t index1, size_t index2);

    // ...

};


inline constexpr bool SkipNoData::is_no_data(
    size_t /* index1 */,
    size_t /* index2 */)
{
    return false;
}

The compiler will ‘see through’ this definition and notice that the result of calling this test is always false. This information will be used to remove this test from if-statements calling it. This may result in the removal of complete if-statement branches, resulting in a more efficient algorithm, equal in efficiency to a hand-written one for the specific case at hand.

For completeness, here is the source code of our new pow algorithm. Notice that even the value of a scalar passed in is obtained by calling get.

#pragma once
#include <cmath>
#include "customization_point.h"


namespace detail {

    template<
        typename InputNoDataPolicy,
        typename OutputNoDataPolicy,
        typename Base,
        typename Exponent,
        typename Result>
    inline void pow(
        InputNoDataPolicy const& input_no_data_policy,
        OutputNoDataPolicy& output_no_data_policy,
        Base const& base,
        Exponent const& exponent,
        Result& result,
        raster_tag,
        number_tag)
    {
        size_t const nr_rows{size(base, 0)};
        size_t const nr_cols{size(base, 1)};

        for(size_t r = 0; r < nr_rows; ++r) {
            for(size_t c = 0; c < nr_cols; ++c) {
                if(std::get<0>(input_no_data_policy).is_no_data(r, c) ||
                    std::get<1>(input_no_data_policy).is_no_data()) {
                        output_no_data_policy.mark_as_no_data(r, c);
                }
                else {
                    get(result, r, c) = std::pow(get(base, r, c),
                        get(exponent));
                }
            }
        }
    }

    template<
        typename InputNoDataPolicy,
        typename OutputNoDataPolicy,
        typename Base,
        typename Exponent,
        typename Result>
    inline void pow(
        InputNoDataPolicy const& input_no_data_policy,
        OutputNoDataPolicy& output_no_data_policy,
        Base const& base,
        Exponent const& exponent,
        Result& result,
        raster_tag,
        raster_tag)
    {
        size_t const nr_rows{size(base, 0)};
        size_t const nr_cols{size(base, 1)};

        for(size_t r = 0; r < nr_rows; ++r) {
            for(size_t c = 0; c < nr_cols; ++c) {
                if(std::get<0>(input_no_data_policy).is_no_data(r, c) ||
                    std::get<1>(input_no_data_policy).is_no_data(r, c)) {
                            output_no_data_policy.mark_as_no_data(r, c);
                }
                else {
                    get(result, r, c) = std::pow(get(base, r, c),
                        get(exponent, r, c));
                }
            }
        }
    }

}  // namespace detail


template<
    typename InputNoDataPolicy,
    typename OutputNoDataPolicy,
    typename Base,
    typename Exponent,
    typename Result>
inline void pow(
    InputNoDataPolicy const& input_no_data_policy,
    OutputNoDataPolicy& output_no_data_policy,
    Base const& base,
    Exponent const& exponent,
    Result& result)
{
    using base_tag = tag<Base>;
    using exponent_tag = tag<Exponent>;

    detail::pow(input_no_data_policy, output_no_data_policy, base, exponent,
        result, base_tag{}, exponent_tag{});
}

We started this post with a pow algorithm accepting both raster and scalar arguments. We generalized this algorithm further for the (optional) handling of invalid elements in the input and output parameters. In the next section we will have a look at handling input values that are outside the algorithm’s domain, and handling result values that fall outside of the result value type’s range. As you may already guess, this involves additional policies.

Generic algorithms for 2D spatial fields II: Data type

In the previous post we had a quick look at generic algorithms, spatial fields and the requirements of generic algorithms for 2D spatial fields. In this post we are going to tackle the first requirement:

It must be possible for the user to pick the data type and value type of the input and output arguments. The data type does not need to be picked from a fixed set of types. The user must be able to use her own types.

This requirement is the easiest to fulfill. By turning our previous function into a function template, we can parameterize the function by the data types of the arguments and the result.

#pragma once
#include <cmath>


template<
    typename Base,
    typename Exponent,
    typename Result>
inline void pow(
    Base const& base,
    Exponent const& exponent,
    Result& result)
{
    size_t const nr_rows{base.shape()[0]};
    size_t const nr_cols{base.shape()[1]};

    for(size_t r = 0; r < nr_rows; ++r) {
        for(size_t c = 0; c < nr_cols; ++c) {
            result[r] = std::pow(base[r], exponent);
        }
    }
}

This algorithm can be called like this:

#include <cstddef>
#include "fern/feature/core/raster.h"
#include "pow.h"


int main(
    int /* argc */,
    char** /* argv */)
{
    using Raster = fern::Raster<double, 2>;

    size_t const nr_rows{2};
    size_t const nr_cols{3};
    double west{0.0};
    double north{0.0};
    double cell_size{10.0};
    Raster::Transformation transformation{{west, cell_size, north, cell_size}};
    Raster base(fern::extents[nr_rows][nr_cols], transformation);

    // Assign values to the raster.
    // ...

    double exponent{2};

    Raster result(fern::extents[nr_rows][nr_cols], transformation);

    pow(base, exponent, result);

    return EXIT_SUCCESS;
}

Some observations:

  • The result is now passed as a writable argument to the function. The compiler instantiates a function for each unique set of template arguments that we provide. By turning the result into an argument, these instantiations are involved in overload resolution allowing the compiler to pick the best match. This is relevant if we call the algorithm for the same Base and Exponent types, but different Result types. If we had returned the result the compiler would have thrown an error complaining about ambiguous overloads. It cannot choose between two functions with the same argument types, but different return types.
  • Although we turned the argument and result data types into template arguments, we are still limiting our users to the fern::Raster interface. Only raster types adhering to this interface can be passed. Ideally, we want to decouple the algorithm from a specific data type’s interface, allowing the user to use any raster-like type, whatever its interface.
  • Although Base is a raster-like type, Exponent must still be a floating point type. We would like to also be able to pass a raster-like type for the exponent argument.

Let’s tackle these limitations in turn. A technique to decouple an algorithm from its argument’s interfaces is to use customization points. And we can use a technique called tag dispatching to select an algorithm’s implementation based on properties of the template’s argument types.

Customization points

Let us forget about a single specific class for representing rasters and focus on the minimal set of requirements that the pow algorithm poses on the arguments passed in. Given that the algorithm currently takes a raster as its first argument and a number as its second, we can define the following requirements:

  • It must be possible to determine the number of rows and columns in the base argument raster.
  • It must be possible to iterate over all raster cells and obtain a readable value of each cell from the base argument raster and a writable reference of each cell in the result raster.

Customization points are free function templates that return a property of an argument and which can be overloaded for different argument types. One example from the C++ Standard Library is std::begin, which returns an iterator to the first element in a collection. This function is overloaded for the various containers in the STL and for C-style arrays. Using std::begin to obtain such an iterator is a better option than calling my_container.begin() because the latter assumes my_container‘s type provides this function. C-style arrays don’t, for example.

So, customization points help to write code that does not depend on a specific type’s interface. Here is a declaration of a customization point for getting the size of a raster:

template<
    typename Raster>
size_t size(Raster const& raster, size_t dimension);

It takes a raster and an id of a dimension. In the case of 2D rasters, this latter argument must be a 0 or a 1, where 0 represents the first dimension in a 2D array and 1 the second. Since C-style arrays have a row-major ordering, the first dimension iterates over the rows and the second over the columns.

The Raster template argument has no relation with the fern::Raster class template used earlier. We can implement the customization point for our fern::Raster class template as follows, using a partial specialization:

template<
    typename T,
    size_t nr_dimensions>
size_t size(
   fern::Raster<T, nr_dimensions> const& raster,
   size_t dimension)
{
   return raster.shape()[dimension];
}

Within the customization point, we can use our type specific interface. For raster types with other interfaces we can implement alternative overloads. In our algorithm we can now replace the code obtaining the size of the dimensions to be calls to our size customization point.

We need two more customization points: one for obtaining a readable value of a specific cell and one for obtaining a writable reference to a value of a specific cell. Let’s call them both get:

template<
    typename Raster>
value_type<Raster> const& get(Raster const& raster, size_t row, size_t col);

template<
    typename Raster>
value_type<Raster>& get(Raster& raster, size_t row, size_t col);

The return type is calculated by the value_type alias template, which returns the value type of the template argument. Its definition assumes that type traits (see below) provide the value type of every data type:

template<
    typename T>
using value_type = typename TypeTraits<T>::value_type;

For our fern::Raster class template we can implement the additional customization points as follows:

template<
    typename T,
    size_t nr_dimensions>
value_type<fern::Raster<T, nr_dimensions>> const& get(
    fern::Raster<T, nr_dimensions> const& raster,
    size_t row,
    size_t col)
{
    return raster[row][col];
}

template<
    typename T,
    size_t nr_dimensions>
value_type<fern::Raster<T, nr_dimensions>>& get(
    fern::Raster<T, nr_dimensions>& raster,
    size_t row,
    size_t col)
{
    return raster[row][col];
}

The last missing piece is the implementation of the TypeTraits class template. Type traits provide properties of a type. In our case, we need the value type of the Raster class template.

template<
    typename T>
struct TypeTraits
{
};
template<
    typename T,
    size_t nr_dimensions>
struct TypeTraits<
    fern::Raster<T, nr_dimensions>>
{
    using value_type = T;
};

Let’s now rewrite our algorithm in terms of customization points:

#pragma once
#include <cmath>
#include "customization_point.h"


template<
    typename Base,
    typename Exponent,
    typename Result>
inline void pow(
    Base const& base,
    Exponent const& exponent,
    Result& result)
{
    size_t const nr_rows{size(base, 0)};
    size_t const nr_cols{size(base, 1)};

    for(size_t r = 0; r < nr_rows; ++r) {
        for(size_t c = 0; c < nr_cols; ++c) {
            get(result, r, c) = std::pow(get(base, r, c), exponent);
        }
    }
}

Note that the algorithm is written in terms of template parameters and customization points, all to be provided by the caller. We abstracted away any reference to a specific raster type. With this in place, we can now call this algorithm with every raster-like base argument for which we have implemented the three customization points. In the next section we are going to get rid of the limitation that the base argument must be a raster and the exponent a number.

Tag dispatching

If we call our current pow algorithm with a raster instance as the exponent argument, the compilation fails. Our algorithm expects the exponent to be a single number. Passing a raster requires a different implementation. The problem we need to solve is how we can pass different permutations of raster and number arguments to pow, without having to revert to differently named functions like pow_raster_number and pow_raster_raster. Somehow, our pow implementation needs to do something different depending on its template argument types.

Tag dispatching can help us here. What we need is a tag for each data type we support and call a second function that actually implements the algorithm for each unique combination of argument types. Let us first define the tags:

struct number_tag {};
struct raster_tag {};

Now we need a way to obtain the tag given a data type. We can extent our traits class for that:

template<
    typename T>
struct TypeTraits
{
    using tag = number_tag;  // Default.
};


template<
    typename T>
using tag = typename TypeTraits<T>::tag;
template<
    typename T,
    size_t nr_dimensions>
struct TypeTraits<
    fern::Raster<T, nr_dimensions>>
{
    using value_type = T;  // Like before.
    using tag = raster_tag;
};

In our original pow implementation we can now dispatch on data type tag to an algorithm with the correct implementation. Note that this all happens during the compilation, and not at runtime.

#pragma once
#include <cmath>
#include "customization_point.h"


namespace detail {

    template<
        typename Base,
        typename Exponent,
        typename Result>
    inline void pow(
        Base const& base,
        Exponent const& exponent,
        Result& result,
        raster_tag,
        number_tag)
    {
        size_t const nr_rows{size(base, 0)};
        size_t const nr_cols{size(base, 1)};

        for(size_t r = 0; r < nr_rows; ++r) {
            for(size_t c = 0; c < nr_cols; ++c) {
                get(result, r, c) = std::pow(get(base, r, c), exponent);
            }
        }
    }

    template<
        typename Base,
        typename Exponent,
        typename Result>
    inline void pow(
        Base const& base,
        Exponent const& exponent,
        Result& result,
        raster_tag,
        raster_tag)
    {
        size_t const nr_rows{size(base, 0)};
        size_t const nr_cols{size(base, 1)};

        for(size_t r = 0; r < nr_rows; ++r) {
            for(size_t c = 0; c < nr_cols; ++c) {
                get(result, r, c) = std::pow(get(base, r, c),
                    get(exponent, r, c));
            }
        }
    }

}  // namespace detail


template<
    typename Base,
    typename Exponent,
    typename Result>
inline void pow(
    Base const& base,
    Exponent const& exponent,
    Result& result)
{
    using base_tag = tag<Base>;
    using exponent_tag = tag<Exponent>;

    detail::pow(base, exponent, result, base_tag{}, exponent_tag{});
}

More overloads of detail::pow can be implemented if necessary.

With these changes we made it possible for the user to use our algorithm with her own data types. In the next post we are going to take a look at handling invalid data elements.

Generic algorithms for 2D spatial fields I: Introduction

This is the first in a series of blog posts about developing generic algorithms in C++ for handling 2D spatial fields. The goal of this series is to explain how we can generalize algorithms handling spatial fields and why that might be useful.

Generic algorithms

Generic algorithms in C++ are templated algorithms that are written in terms of template argument types that are defined later, when the algorithms are instantiated. The C++ Standard Template Library (STL) contains many examples of generic algorithms.

To appreciate the benefits of generic algorithms over non-generic algorithms, we can first take a look at a non-generic algorithm and compare it to a generic one. Let’s define an algorithm for finding the maximum element in a collection:

#include <vector>

std::vector<int>::iterator max_element(
    std::vector<int>::iterator first,
    std::vector<int>::iterator last)
{
    std::vector<int>::iterator it = first;

    for(++first; first != last; ++first) {
        if(*first > *it) {
            it = first;
        }
    }

    return it;
}

The algorithm accepts iterators pointing to the first element and end position of the collection. It returns an iterator pointing to the first element with the greatest value, or the end position if the collection is empty.

This algorithm works fine, but only if we want to find the maximum element in a vector of integers. When this is not the case, we end up writing multiple algorithms for multiple argument types. That is the kind of work computers are better at than we. Fortunately, we can ask the compiler to generate such a family of algorithms for us, by using function templates. The generic version of max_element is part of the STL (std::max_element) and has the following prototype (we skip the overload accepting a user-defined comparison function):

template<
    typename ForwardIt>
ForwardIt max_element(ForwardIt first, ForwardIt last);

When we call this generic algorithm like this:

std::vector<int> my_values{1, 5, 2, 4, 3};
auto it = std::max_element(my_values.begin(), my_values.end());

the compiler will instantiate a similar algorithm for us as the non-generic version which we wrote above. But the generic algorithm can also be called with other argument types:

std::list<std::string> my_labels{"amsterdam", "rotterdam", "the hague"};
auto it = std::max_element(my_labels.begin(), my_labels.end());

So, in short, generic algorithms are templates for actual algorithms, with some aspects provided by the caller. The compiler will substitute the missing aspects (the template parameters) with the information provided at the call site and instantiate and compile the final algorithm. Using the generic algorithm in a different context involves calling it with different template argument types.

A generic algorithm cannot be compiled without the caller providing essential information that the algorithm is parameterized on.

One reason for developing generic algorithms is that the person developing the algorithms doesn’t have all the pieces of information required to be able to implement the final algorithm. An important task when designing generic algorithms is to figure out what the pieces of information are that users will want to vary. These will end up as template parameters of the generic algorithms.

Spatial fields

To limit ourselves a bit to data that we know most about we will now turn to spatial fields. The term field is overloaded and has many different meanings. In our case we use the term in the spatial sense. A field is a quantity that (at least in theory) has a value at any point in space. This value varies continuously through space. Here we limit ourselves to 2D fields. Examples of such fields are the atmospheric pressure at a certain height above the earth’s surface, and the elevation of the earth’s surface itself (both projected on a 2D Cartesian plane). To be able to conveniently handle such quantities in the computer, we discretize fields using rasters, which are essentially 2D arrays with some additional information about where each raster is positioned according to some coordinate reference system, and the width and height of the raster cells. For our purposes here it is enough to think of spatial fields as 2D arrays positioned somewhere.

field_raster

Next, we will turn to the requirements of our generic algorithms.

Requirements

In this section we will develop a first version of a generic algorithm handling 2D spatial field data. It is good practice to start with a non-generic version, and get a clear idea about the shortcomings, before writing generic code.

For our purposes here we are not interested in the calculations themselves. In fact we are interested in everything but the calculations themselves. The algorithm we will develop will simply raise each cell in a raster to the power of some exponent. Here is a first try:

#include <cmath>
#include "fern/feature/core/raster.h"

fern::Raster<double, 2> pow(
    fern::Raster<double, 2> const& base,
    double exponent)
{
    size_t const nr_rows{base.shape()[0]};
    size_t const nr_cols{base.shape()[1]};
    fern::Raster<double, 2> result(fern::extents[nr_rows][nr_cols],
        base.transformation());

    for(size_t r = 0; r < nr_rows; ++r) {
        for(size_t c = 0; c < nr_cols; ++c) {
            result[r] = std::pow(base[r], exponent);
        }
    }

    return result;
}

We use the Raster class template from Geoneric‘s Fern library, but this could be any class with which we can model a raster. The details of this class’ implementation don’t matter here.

Some observations and drawbacks of this implementation:

  • The type of the individual elements (further on we call this the value type) in the base raster are fixed to double. This is limiting the caller. She might prefer to pass in float or long double values. The same is true for exponent.
  • The type of the base argument (the data type) is fixed to Raster<double, 2>. This is also limiting the caller. She has to use our Raster type while she may have a type for modelling rasters that fits her needs better.
  • The exponent is fixed to a double, but we may want to be able to pass a raster instead of a single number. When we pass a raster for exponent, we expect each cell in base to be raised by the corresponding cell in exponent.
  • The result is returned, which is OK if Raster supports move semantics. Otherwise it is definitely not OK. We don’t want to copy possibly very large rasters. Furthermore, in C++ return types are not considered in overload resolution which limits our possibilities to make the return type configurable by the caller. This will be explained in more detail in the next post.
  • Often, algorithms have a limited domain of valid input values. For example, a negative finite base and a non-integer finite exponent are considered out of domain by the std::pow algorithm. What should happen if an algorithm’s input argument contains values that are not part of the algorithm’s valid domain of input values?
  • Similarly, output value types have a limited range of values they can represent. For example, an int8_t cannot hold values larger than $latex \frac{2^8}{2} – 1$. What should happen if an algorithm calculates values that are not part of the valid range of values?
  • In the algorithm above, we perform a calculation for each cell. In reality, rasters often contain cells with undefined values. Depending on the field that is represented by the raster, this may be due to clouds, measurement errors, errors in previous calculations, etc. We clearly need a way to test whether or not a cell from an input raster contains a valid value or not.
  • Given the previous observations, we also need a way to mark that an output raster cell does not contain a valid value. If an input raster cell contains an invalid value we want to be able to mark the corresponding cell in the output raster as invalid too. Also, if we are able to detect out-of-domain and out-of-range errors, we want to be able to mark the corresponding output raster cells as invalid.

Given the non-generic algorithm and the observations, we are now better equipped to list the requirements of a generic algorithm handling spatial fields:

  • It must be possible for the user to pick the data type and value type of the input and output arguments. The data type does not need to be picked from a fixed set of types. The user must be able to use her own types.
  • It must be possible for the user to ‘explain’ to the algorithm how to:
    • detect invalid input raster cell values
    • mark invalid output raster cell values
    • handle out of domain input values
    • handle out of range output values
  • However the above requirements are met, the performance of the algorithm must be equal to a handwritten non-generic algorithm.

In the next posts, we are going to tackle each of these requirements in turn.