数组和结构的阵列结构 - 性能差异 [英] Structure of arrays and array of structures - performance difference

查看：101 发布时间：2016/8/18 13:12:31 c++ c performance caching gcc

本文介绍了数组和结构的阵列结构 - 性能差异的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有这样一个类：

//Array of Structures
class Unit
{
  public:
    float v;
    float u;
    //And similarly many other variables of float type, upto 10-12 of them.
    void update()
    {
       v+=u;
       v=v*i*t;
       //And many other equations
    }
};

我创建单位类型的对象的数组。并呼吁他们更新。

I create an array of objects of Unit type. And call update on them.

int NUM_UNITS = 10000;
void ProcessUpdate()
{
  Unit *units = new Unit[NUM_UNITS];
  for(int i = 0; i < NUM_UNITS; i++)
  {
    units[i].update();
  }
}

为了加快东西，并有可能autovectorize环路，我转换AOS构建阵列。

In order to speed up things, and possibly autovectorize the loop, I converted AoS to structure of arrays.

//Structure of Arrays:
class Unit
{
  public:
  Unit(int NUM_UNITS)
  {
    v = new float[NUM_UNITS];
  }
  float *v;
  float *u;
  //Mnay other variables
  void update()
  {
    for(int i = 0; i < NUM_UNITS; i++)
    {
      v[i]+=u[i];
      //Many other equations
    }
  }
};

在循环未能autovectorize，我得到的阵列结构的表现很糟糕。对于50个单位，SOA的更新是略快于AoS.But然后从100个单位开始，SOA不仅仅是AOS慢。在300个单位，SOA是几乎两倍更糟。在100K单位，SOA是4倍比AOS慢。尽管缓存可能是SOA的一个问题，我没有想到的性能差别是这么高。分析上cachegrind显示错过这两个方法的相似数量。一股股对象的大小为48字节。 L1缓存为256K，L2为1MB和L3为8MB。我缺少的是在这里吗？这真的是一个缓存的问题？

When the loop fails to autovectorize, i am getting a very bad performance for structure of arrays. For 50 units, SoA's update is slightly faster than AoS.But then from 100 units onwards, SoA is slower than AoS. At 300 units, SoA is almost twice as worse. At 100K units, SoA is 4x slower than AoS. While cache might be an issue for SoA, i didnt expect the performance difference to be this high. Profiling on cachegrind shows similar number of misses for both approach. Size of a Unit object is 48 bytes. L1 cache is 256K, L2 is 1MB and L3 is 8MB. What am i missing here? Is this really a cache issue?

编辑：
我用gcc 4.5.2。编译器选项-03 -msse4 -ftree-量化。

I am using gcc 4.5.2. Compiler options are -o3 -msse4 -ftree-vectorize.

我做SOA的另一个实验。而不是动态分配的数组，我在编译时分配的V和U。当有100K的单位，这给出了一个性能，它比SOA快10倍与动态分配的阵列。发生了什么吗？为什么有静态和动态分配的内存之间的这种性能差异？

I did another experiment in SoA. Instead of dynamically allocating the arrays, i allocated "v" and "u" in compile time. When there are 100K units, this gives a performance which is 10x faster than the SoA with dynamically allocated arrays. Whats happening here? Why is there such a performance difference between static and dynamically allocated memory?

数组和结构的阵列结构 - 性能差异 [英] Structure of arrays and array of structures - performance difference

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

数组和结构的阵列结构 - 性能差异 [英] Structure of arrays and array of structures - performance difference

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭