Generic algorithms for 2D spatial fields III: No-data

In the first post in this series we had a quick look at generic algorithms, spatial fields and the requirements of generic algorithms for 2D spatial fields. In this post we are going to tackle the first part of the second requirement:

It must be possible for the user to ‘explain’ to the algorithm how to:

  • detect invalid input raster cell values
  • mark invalid output raster cell values

The algorithm we developed until now assumes all input argument raster cells contain valid data. Here is the core of the algorithm raising a raster of values by a single value:

for(size_t r = 0; r < nr_rows; ++r) {
    for(size_t c = 0; c < nr_cols; ++c) {
        get(result, r, c) = std::pow(get(base, r, c), exponent);
    }
}

As mentioned in the introduction, we often have to take invalid values into account. Ideally, we want to be able to tell the algorithm whether or not our argument rasters may contain invalid cells or not, and if so, how to test each individual cell’s value for validity.

Similarly, instead of being able to test input raster cells for validity, we also want to be able to mark result raster cells as invalid, for example in case one of the argument rasters contains invalid cells, or contains values that are considered out of the algorithm’s domain.

Policy classes are a technique that can help us with these requirements. Policy classes allow us to parameterize behavior of an algorithm, which enables the caller to tune the algorithm to her needs. An example of a policy from the C++ Standard Library is Allocator, which allows the caller to tune the allocation and deallocation of memory in code accepting an allocator policy (most containers from the Standard Library do, see for example the second template parameter of the std::vector class template).

Our algorithm needs to be parameterized with policy classes that encapsulate the behavior of how to deal with no-data values in the input and output rasters. We must then rewrite our algorithm to use these policies. The caller of our algorithm can then decide if and how input no-data must be tested and if and how output no-data must be marked. Depending on the raster type used, no-data may be marked by a special marker value, like the maximum or minimum representable value by the value type, or maybe by a layered boolean raster or a bit mask. It all depends on the type passed in the algorithm. As long as the algorithm has a way to test and mark no-data elements, it can do its job.

Let’s think about the interface of the no-data checking and marking policies. For maximum flexibility, we decided to separate the no-data checking and marking policies, which we call the input no-data policies, and output no-data policies respectively. Both kinds of no-data policies need to be able to either test or mark individual elements for validity. In the current algorithm, individual raster cells are addressed using row and column indices, which we can use for our no-data policies too. So, an input no-data policy needs to be able to tell whether or not an element with a certain row and column index contains valid data or not:

bool MyInputNoDataPolicy::is_no_data(size_t row, size_t col);

Similarly, an output no-data policy needs to be able to mark an element as valid or invalid:

bool MyOutputNoDataPolicy::mark_as_no_data(size_t row, size_t col);

We are not passing the raster itself, because we have no idea how the policy is implemented. It may use the raster, or not. This means that we must pass policy instances to the algorithm, the types alone are not enough. The policy instance must be self-contained and be able to answer our requests, only given an element’s address.

Here is an updated version of the part of our algorithm that raises a raster by a number:

template<
    typename InputNoDataPolicy,
    typename OutputNoDataPolicy,
    typename Base,
    typename Exponent,
    typename Result>
void pow(
    InputNoDataPolicy const& input_no_data_policy,
    OutputNoDataPolicy& output_no_data_policy,
    Base const& base,
    Exponent const& exponent,
    Result& result,
    raster_tag,
    number_tag)
{
    size_t const nr_rows{size(base, 0)};
    size_t const nr_cols{size(base, 1)};

    for(size_t r = 0; r < nr_rows; ++r) {
        for(size_t c = 0; c < nr_cols; ++c) {
            if(input_no_data_policy.is_no_data(r, c)) {
                output_no_data_policy.mark_as_no_data(r, c);
            }
            else {
                get(result, r, c) = std::pow(get(base, r, c), exponent);
            }
        }
    }
}

We are using the input no-data policy to test individual raster elements for validity. If an element is not valid, the output no-data policy is used to mark a no-data element in the result raster. This works fine in this specific case, but we can still do better.

In general, algorithms may accept zero or more arguments, which all may have a different way to mark no-data cells (if at all!). So, instead of a single input no-data policy we need an input no-data policy for every individual input argument. And in the algorithms we need to test input argument elements using the corresponding, argument-specific, policy. That way, the caller can pass the appropriate input no-data policies, given the specific arguments passed into the algorithm.

Generalizing further, being able to test and mark validity is as relevant for scalar values as for rasters containing values. For example, when calculating the sum of the values in a raster, the result may be larger than the type used to represent the (scalar) result can hold (out of range error). In that case, the caller may want the algorithm to mark the result as being invalid. So, we need to be able to test for invalid elements in all arguments, whatever their types, and we need to be able to mark no-data in all results, whatever their types.

In our algorithm, we likely use code like this (again, assuming a raster base and number exponent:

if(std::get<0>(input_no_data_policy).is_no_data(r, c) ||
        std::get<1>(input_no_data_policy).is_no_data()) {
    output_no_data_policy.mark_as_no_data(r, c);
}
else {
    get(result, r, c) = std::pow(get(base, r, c), exponent);
}

In words: if the current base raster cell is not valid or if the exponent is not valid, then mark the current result raster cell as not valid. In all other cases, calculate a result given the valid input values. The input no-data policy passed in here, is an std::tuple of argument-specific input no-data policies.

Supporting testing and marking no-data elements for all data types, involves implementing our get customization point for all data types, including scalars. Imagine some type MaskedValue, which supports keeping track of a scalar and whether it is valid or not. It probably has methods to obtain the value and to test whether or not this value is valid. Passing such an instance to pow will result in compile errors. We need to pass the value contained in the instance. For MaskedValue, the get customization point would look something like this:

template<
    typename T>
inline typename DataTraits<MaskedValue<T>>::const_reference get(
    MaskedValue<T> const& constant)
{
    return constant.value();
}


template<
    typename T>
inline typename DataTraits<MaskedValue<T>>::reference get(
    MaskedValue<T>& constant)
{
    return constant.value();
}

In the case of simple scalar types, we need to define overloads of these customization points that just return the scalar value itself.

We are adding quite some code to our simple algorithm. Aren’t we adding logic for worst-case scenarios that may not be applicable to most use cases? For example, the exponent number may be a constant hard coded by the caller, like a 2 for squaring the base. This exponent will never be invalid and we don’t want the algorithm to spend any CPU cycle on verifying this. For those circumstances it is useful to have dummy policies that contain code that an optimizing compiler will remove from the instantiated algorithm during code generation. Here is a method from a SkipNoData input no-data policy:

class SkipNoData
{
public:

    // ...

    static constexpr bool is_no_data(size_t index1, size_t index2);

    // ...

};


inline constexpr bool SkipNoData::is_no_data(
    size_t /* index1 */,
    size_t /* index2 */)
{
    return false;
}

The compiler will ‘see through’ this definition and notice that the result of calling this test is always false. This information will be used to remove this test from if-statements calling it. This may result in the removal of complete if-statement branches, resulting in a more efficient algorithm, equal in efficiency to a hand-written one for the specific case at hand.

For completeness, here is the source code of our new pow algorithm. Notice that even the value of a scalar passed in is obtained by calling get.

#pragma once
#include <cmath>
#include "customization_point.h"


namespace detail {

    template<
        typename InputNoDataPolicy,
        typename OutputNoDataPolicy,
        typename Base,
        typename Exponent,
        typename Result>
    inline void pow(
        InputNoDataPolicy const& input_no_data_policy,
        OutputNoDataPolicy& output_no_data_policy,
        Base const& base,
        Exponent const& exponent,
        Result& result,
        raster_tag,
        number_tag)
    {
        size_t const nr_rows{size(base, 0)};
        size_t const nr_cols{size(base, 1)};

        for(size_t r = 0; r < nr_rows; ++r) {
            for(size_t c = 0; c < nr_cols; ++c) {
                if(std::get<0>(input_no_data_policy).is_no_data(r, c) ||
                    std::get<1>(input_no_data_policy).is_no_data()) {
                        output_no_data_policy.mark_as_no_data(r, c);
                }
                else {
                    get(result, r, c) = std::pow(get(base, r, c),
                        get(exponent));
                }
            }
        }
    }

    template<
        typename InputNoDataPolicy,
        typename OutputNoDataPolicy,
        typename Base,
        typename Exponent,
        typename Result>
    inline void pow(
        InputNoDataPolicy const& input_no_data_policy,
        OutputNoDataPolicy& output_no_data_policy,
        Base const& base,
        Exponent const& exponent,
        Result& result,
        raster_tag,
        raster_tag)
    {
        size_t const nr_rows{size(base, 0)};
        size_t const nr_cols{size(base, 1)};

        for(size_t r = 0; r < nr_rows; ++r) {
            for(size_t c = 0; c < nr_cols; ++c) {
                if(std::get<0>(input_no_data_policy).is_no_data(r, c) ||
                    std::get<1>(input_no_data_policy).is_no_data(r, c)) {
                            output_no_data_policy.mark_as_no_data(r, c);
                }
                else {
                    get(result, r, c) = std::pow(get(base, r, c),
                        get(exponent, r, c));
                }
            }
        }
    }

}  // namespace detail


template<
    typename InputNoDataPolicy,
    typename OutputNoDataPolicy,
    typename Base,
    typename Exponent,
    typename Result>
inline void pow(
    InputNoDataPolicy const& input_no_data_policy,
    OutputNoDataPolicy& output_no_data_policy,
    Base const& base,
    Exponent const& exponent,
    Result& result)
{
    using base_tag = tag<Base>;
    using exponent_tag = tag<Exponent>;

    detail::pow(input_no_data_policy, output_no_data_policy, base, exponent,
        result, base_tag{}, exponent_tag{});
}

We started this post with a pow algorithm accepting both raster and scalar arguments. We generalized this algorithm further for the (optional) handling of invalid elements in the input and output parameters. In the next section we will have a look at handling input values that are outside the algorithm’s domain, and handling result values that fall outside of the result value type’s range. As you may already guess, this involves additional policies.