SoA/AoS 内存布局的 C++ 零成本抽象 [英] C++ zero-cost abstraction for SoA/AoS memory layouts

查看:57
本文介绍了SoA/AoS 内存布局的 C++ 零成本抽象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个使用结构数组 (AoS) 内存布局的大代码.我想用 C++ 构建一个零成本的抽象,它允许我以尽可能少的重构工作在 AoS 和 SoA 之间切换.例如获取一个具有访问成员函数的类

Say I have a large code using Array of Structures (AoS) memory layout. I would like to build a zero-cost abstraction in C++ which allows me to switch between AoS and SoA with as little refactoring effort as possible. For instance take a class with access member functions

 struct Item{
   auto& myDouble(){ return mDouble; }
   auto& myChar(){ return mChar; }
   auto& myString(){ return mString; }
 private:
   double mDouble;
   char mChar;
   std::string mString;
 };

在容器内循环使用

std::vector<Item> vec_(1000);
for (auto& i : vec_)
  i.myDouble()=5.;

我想更改第一个片段,而第二个片段保持相似……例如有类似的东西

I would like to change the first snippet while the second one remains similar.. e.g. having something like

MyContainer<Item, SoA> vec_(1000)
for (auto& i : vec_)
  i.myDouble()=5.;

在其中我可以选择带有SoA"或AoS"模板参数的内存布局.我的问题是:这样的东西存在于某处吗?如果没有,它最多将如何实施?

in which I can select the memory layout with a "SoA" or "AoS" template parameters. My questions are: does such thing exist somewhere? And if it doesn't, how would it be implemented at best?

推荐答案

我实现了一个通用的解决方案,我将在下面解释它(这将是一个很长的帖子).这当然不是唯一可能的答案,收集反馈会很棒.我把这个解决方案的完整代码放在这里 https://github.com/crosetto/SoAvsAoS

I implemented a generic solution, I'll explain it here below (it will be a long post). This is not the only possible answer of course, and it'd be great to gather feedback. I placed the full code of this solution here https://github.com/crosetto/SoAvsAoS

我们创建了两个辅助类,它们根据标签模板参数将容器类型生成为元组向量或向量元组.我们称这个类为 DataLayoutPolicy,我们将使用它,例如这样:

We create two helper classes which given an item generate the container type as a vector of tuples or a tuple of vectors, depending on a tag template argument. We call this class a DataLayoutPolicy and we are going to use it e.g. in this way:

DataLayoutPolicy<std::vector, SoA, char, double, std::string>

生成一个由 char、int 和 double 组成的向量元组.

to generate a tuple of vectors of char, int and double.

enum class DataLayout { SoA, //structure of arrays
                        AoS //array of structures
};
template <template <typename...> class Container, DataLayout TDataLayout, typename TItem>
struct DataLayoutPolicy;

这个类将只包含与容器交互的静态成员函数(例如提取元素、插入、调整大小等...).我们编写了两个模板专业化.第一个(微不足道的)用于结构数组的情况:

This class will only contain static member functions to interact with the container (e.g. extract an element, insert, resize, etc...). We write two template specializations. The first one (trivial) for the array of structures situation:

template <template <typename...> class Container, template<typename...> class TItem, typename... Types>
struct DataLayoutPolicy<Container, DataLayout::AoS, TItem<Types...>> {
    using type = Container<TItem<Types...>>;
    using value_type = TItem<Types...>&;

    constexpr static value_type get( type& c_, std::size_t position_ ){ return value_type(*static_cast<TItem<Types...>*>(&c_[ position_ ])); }

    constexpr static void resize( type& c_, std::size_t size_ ) { c_.resize( size_ ); }

    template <typename TValue>
    constexpr static void push_back( type& c_, TValue&& val_ ){ c_.push_back( val_ ); }
    static constexpr std::size_t size(type& c_){ return  c_.size(); }
};

...只是转发.我们对数组结构的情况做同样的事情.

... just forwarding. We do the same for the structure of arrays case.

注意:下面的代码有几点需要解释.

Note: there are a few things to be explained about the code below.

它将所有类型包装在一个 ref_wrap 类型中,这是一个装饰"的 std::reference_wrapper.这是因为我们希望将元素作为左值引用访问,以便能够更改它们的值.使用常规参考我们会遇到麻烦,例如类型包含任何引用.值得注意的一件事是,在 AoS 情况下 DataLayoutPolicy::value_type 是一个引用,而在 SoA 情况下是一个 ref_wrap 类型的值.

It wraps all the types in a ref_wrap type, which is a "decorated" std::reference_wrapper. This is because we want to access the elements as lvalue references, to be able to change their values. using a regular reference we would be in trouble if e.g. Types contains any reference. One thing worth noticing is that in the AoS case the DataLayoutPolicy::value_type is a reference, while in the SoA case is the value of a ref_wrap type.

我们按值返回一个新创建的值的 ref_wrap 元组.这出人意料地可以,因为编译器正在优化所有副本,并且在 C++17 中更可以(返回的元组是纯右值"),因为保证复制省略添加到标准中:元组是未复制,即使 std::tuple 和 std::reference_wrapper 没有复制/移动构造函数,此代码也能工作.

we return by value a newly created tuple of ref_wrap of the values. This is surpisingly OK, because the compiler is optimizing all copies away, and it is even more OK in C++17 (the returned tuple is a 'prvalue'), because of the guaranteed copy elision added to the standard: the tuple is not copied, this code would work even if std::tuple and std::reference_wrapper didn't have a copy/move constructor.

我们使用 std::integer 序列静态展开参数包:这很丑陋,但它是自 C++14(在 C++11 中必须使用模板递归来实现)达到同样的效果).目前还没有像for_each"这样的参数包.

we use a std::integer sequence to statically unroll a parameter pack: this is ugly but it is "the way" to do it since C++14 (and in C++11 one had to use template recursion to achieve the same). There is not yet such thing like a "for_each" for parameter packs.

我们使用 C++17 折叠表达式多次调用返回 void 的函数.在 C++17 之前,这是通过巧妙的技巧简明地实现的.

we use C++17 fold expressions to call a function returning void multiple times. Before C++17 this was achieved concisely with tricky hacks.

template <typename T>
struct ref_wrap : public std::reference_wrapper<T>{
    operator T&() const noexcept { return this->get(); }
    ref_wrap(T& other_) : std::reference_wrapper<T>(other_){}
    void operator =(T && other_) {this->get()=other_;}
};

template <template <typename...> class Container, template<typename...> class TItem, typename... Types>
struct DataLayoutPolicy<Container, DataLayout::SoA, TItem<Types...>> {
    using type = std::tuple<Container<Types>...>;
    using value_type = TItem<ref_wrap<Types>...>;

    constexpr static value_type get( type& c_, std::size_t position_ )
    {
        return doGet( c_, position_, std::make_integer_sequence<unsigned, sizeof...( Types )>() ); // unrolling parameter pack
    }

    constexpr static void resize( type& c_, std::size_t size_ ) {
        doResize( c_, size_, std::make_integer_sequence<unsigned, sizeof...( Types )>() ); // unrolling parameter pack
    }

    template <typename TValue>
    constexpr static void push_back( type& c_, TValue&& val_ ){
        doPushBack( c_, std::forward<TValue>(val_), std::make_integer_sequence<unsigned, sizeof...( Types )>() ); // unrolling parameter pack
    }

    static constexpr std::size_t size(type& c_){ return std::get<0>( c_ ).size(); }

    private:

    template <unsigned... Ids>
    constexpr static auto doGet( type& c_, std::size_t position_, std::integer_sequence<unsigned, Ids...> )
    {
        return value_type{ ref_wrap( std::get<Ids>( c_ )[ position_ ] )... }; // guaranteed copy elision
    }

    template <unsigned... Ids>
    constexpr static void doResize( type& c_, unsigned size_, std::integer_sequence<unsigned, Ids...> )
    {
        ( std::get<Ids>( c_ ).resize( size_ ), ... ); //fold expressions
    }

    template <typename TValue, unsigned... Ids>
    constexpr static void doPushBack( type& c_, TValue&& val_, std::integer_sequence<unsigned, Ids...> )
    {
        ( std::get<Ids>( c_ ).push_back( std::get<Ids>( std::forward<TValue>( val_ ) ) ), ... ); // fold expressions
    }
};

所以现在这段代码非常清楚地展示了如何构建这种抽象.我们在下面展示了使用它的可能策略.我们使用 DataLayoutPolicy 和通用 TItem 类型定义 policy_t 类型

So now this code shows pretty clearly how this abstraction can be built. We show below a possible strategy to use it. We define the policy_t type using the DataLayoutPolicy and a generic TItem type

template <template <typename T> class TContainer, DataLayout TDataLayout, typename TItem>
using policy_t = DataLayoutPolicy<TContainer, TDataLayout, TItem>;

容器类将大部分调用转发给policy_t类型定义的静态函数.它可能如下所示

The container class forwards most of the calls to the static functions defined by the policy_t type. It might look like the following

template <template <typename ValueType> class TContainer, DataLayout TDataLayout, typename TItem>
struct BaseContainer
{
    /*member functions like puhs_back, resize,...*/
    value_type operator[]( std::size_t position_ )
    {
            return policy_t::get( mValues, position_ );
    }

    iterator       begin() { return iterator( this, 0 ); }
    iterator       end() { return iterator( this, size() ); }

    private:

    typename policy_t::type mValues;

};

现在这不是标准容器,所以我们必须定义一个迭代器以便在 STL 算法中使用它.我们构建的迭代器看起来像一个用于元组容器的 STL 迭代器,除了它必须持有对容器的引用这一事实,因为当我们调用解引用操作符时,我们想调用我们的存储操作符 [],它静态分派使用容器的数据布局策略进行操作.

Now this is no standard container, so we have to define an iterator in order to use it whithin STL algorithms. The iterator we build looks like a STL iterator for a container of tuple, except for the fact that it must hold a reference to the container, because when we call the dereference operator we want to call our storage's operator[], which statically dispatches the operation using the container's data layout policy.

template <typename  TContainer>
class Iterator
{

private:
    using container_t = TContainer;
public:

    /* ... usual iterator member functions and type definitions ...*/

    template<typename TTContainer>
    Iterator( TTContainer* container_, std::size_t position_ = 0 ):
        mContainer( container_ )
        , mIterPosition( position_ )
    {
    }

    value_type operator*() {
        return (*mContainer)[ mIterPosition ];
    }

    private:
    container_t*        mContainer = nullptr;
    std::size_t         mIterPosition = std::numeric_limits<std::size_t>::infinity();
};

最终我们定义了我们的item"数据结构:我们使它成为 std::tuple 的装饰器,带有一些特定的成员函数(在这种情况下只有 getter/setter).

Eventually we define our "item" data structure: we make it a decorator of a std::tuple, with some specific member functions (in this case only getters/setters).

template<typename ... T>
struct Item : public std::tuple<T ...>{
    using std::tuple<T...>::tuple;
    auto & myDouble(){return std::get<0>(*this);}
    auto & myChar()  {return std::get<1>(*this);}
    auto & myString(){return std::get<2>(*this);}
};

当我们调用 Item 的成员函数时,我们必须依靠编译器优化才能使我们的抽象零成本":我们不想调用 Item 构造函数,因为我们正在创建一个临时元组只是为了访问每次都是它的成员之一,然后我们立即对其进行抨击.

When we call Item's member functions we have to rely on compiler optimization in order for our abstraction to be "zero-cost": we don't want to call the Item constructor, because we are creating a temporary tuple just to access one of it's members each time and then we thrash it right away.

所以最终我们可以编写程序:

so eventually we can write the program:

template<typename T>
using MyVector = std::vector<T, std::allocator<T>>;

int main(int argc, char** argv){
using container_t = BaseContainer<MyVector, DataLayout::SoA, Item<double, char, std::string, Pad> >;
container_t container_(1000);

 for(auto&& i : container_){
    i.myDouble()=static_cast<double>(argc);
}

我们可以编写通用且高效的代码,而不管底层的内存布局如何.剩下要做的是检查这是一个零成本抽象.我检查的最简单方法是使用调试器:使用调试符号编译示例,

and we can write generic and efficient code regardless of the memory layout underneath. What's left to do is to check that this is a zero cost abstraction. The easiest way for me to check that is using a debugger: compile the example with debug symbols on,

> clang++ -std=c++1z -O3 -g main.cpp -o test

用gdb运行,​​在for循环中设置一个断点,并逐步执行汇编指令(layout split命令同时显示源代码和反汇编指令)

run it with gdb, set a brakpoint in the for loop, and step through the assembly instructions (the layout split command shows the source code and disassembled instructions at the same time)

> gdb test
(gdb) break main.cpp : 10 # set breakpoint inside the loop
(gdb) run # execute until the breakpoint
(gdb) layout split # show assembly and source code in 2 separate frames
(gdb) stepi # execute one instruction

循环内执行的指令是在AoS数据布局的情况下

The instructions being executed inside the loop are in case of AoS data layout

0x400b00 <main(int, char**)+192>        movsd  %xmm0,(%rsi)
0x400b04 <main(int, char**)+196>        add    $0x610,%rsi
0x400b0b <main(int, char**)+203>        add    $0xffffffffffffffff,%rcx
0x400b0f <main(int, char**)+207>        jne    0x400b00 <main(int, char**)+192>

请特别注意,在第二行中,用于计算地址的偏移量是 0x160.这取决于项目对象中数据成员的大小.另一方面,对于 SoA 数据结构,我们有

Notice in particular that in the second line the offset being add to compute the address is 0x160. This changes depending on the size of the data members in the item object. On the other hand for the SoA data structure we have

0x400b60 <main(int, char**)+224>        movups %xmm1,(%rdi,%rsi,8)
0x400b64 <main(int, char**)+228>        movups %xmm1,0x10(%rdi,%rsi,8)
0x400b69 <main(int, char**)+233>        movups %xmm1,0x20(%rdi,%rsi,8)
0x400b6e <main(int, char**)+238>        movups %xmm1,0x30(%rdi,%rsi,8)
0x400b73 <main(int, char**)+243>        movups %xmm1,0x40(%rdi,%rsi,8)
0x400b78 <main(int, char**)+248>        movups %xmm1,0x50(%rdi,%rsi,8)
0x400b7d <main(int, char**)+253>        movups %xmm1,0x60(%rdi,%rsi,8)
0x400b82 <main(int, char**)+258>        movups %xmm1,0x70(%rdi,%rsi,8)
0x400b87 <main(int, char**)+263>        movups %xmm1,0x80(%rdi,%rsi,8)
0x400b8f <main(int, char**)+271>        movups %xmm1,0x90(%rdi,%rsi,8)
0x400b97 <main(int, char**)+279>        movups %xmm1,0xa0(%rdi,%rsi,8)
0x400b9f <main(int, char**)+287>        movups %xmm1,0xb0(%rdi,%rsi,8)
0x400ba7 <main(int, char**)+295>        movups %xmm1,0xc0(%rdi,%rsi,8)
0x400baf <main(int, char**)+303>        movups %xmm1,0xd0(%rdi,%rsi,8)
0x400bb7 <main(int, char**)+311>        movups %xmm1,0xe0(%rdi,%rsi,8)
0x400bbf <main(int, char**)+319>        movups %xmm1,0xf0(%rdi,%rsi,8)
0x400bc7 <main(int, char**)+327>        add    $0x20,%rsi
0x400bcb <main(int, char**)+331>        add    $0x8,%rbx
0x400bcf <main(int, char**)+335>        jne    0x400b60 <main(int, char**)+224>

我们看到循环被 Clang(6.0.0 版)展开和向量化,地址的增量是 0x20,与项目结构中存在的数据成员数量无关.

We see the loop is unrolled and vectorized by Clang (version 6.0.0), and the increment for the address is 0x20, independent of the number of data members present in the item struct.

这篇关于SoA/AoS 内存布局的 C++ 零成本抽象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆