选择最高性能的容器(数组) [英] Choice of the most performant container (array)

查看:129
本文介绍了选择最高性能的容器(数组)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我关于容器,特别是数组的一个大问题。



我正在写一个物理代码,主要操作一个大(> 1 000 000)的粒子(每个具有6 double 个坐标)。我正在寻找最好的方式(在性能方面)实现一个类,将包含这些数据的容器,并将提供这些数据的操作原语(例如实例化, operator [] ,等等)。



这个集合的使用方式有一些限制:




  • 它的大小是从配置文件中读取的,并且在执行过程中不会改变。

  • 它可以看作一个大的二维数组N 000)行和6列(每个列在一个维度上存储坐标)

  • 数组在一个大循环中操作,每个粒子/线被访问,

  • 在添加或删除新的元素之前,不会添加或删除任何新的元素。执行



第一个结论,因为对元素的访问基本上是通过 [] ,我认为我应该使用一个正常的动态数组。



我已经探索了一些东西,我想有你的



根据我的理解,使用动态分配的数组没有优势,而是使用 std :: vector ,因此 double ** array2d = new ...,循环的新,等被排除。

这是一个好主意,使用 std :: vector< double>



如果我使用 std :: vector ,我应该创建一个二维数组,如 std :: vector< std :: vector< ; double> >可以像 my_array [i] [j] 的索引的my_array ,或者这是一个坏主意,最好使用 std :: vector< double> other_array 并使用 other_array [6 * i + j] 来接受。



这可以提供更好的性能,特别是列的数量是固定的,从一开始就知道。



如果你认为这是最好的选择,这个向量可以用一个定义为 other_array [i,j]的索引操作符访问//与没有开销的other_array [6 * i + j] (如每次访问时的函数调用)?



另一个选项,我目前使用的是使用Blitz,特别是 blitz :: Array

  typedef blitz :: Array< double,TWO_DIMENSIONS& store_t; 
store_t my_store;

我的元素是这样访问的: my_store(line,column);



我认为在我的案例中使用Blitz没有什么优势,因为我逐一访问每个元素,Blitz会是有趣的是,如果我使用操作直接对数组(如矩阵乘法),我不是。



你认为Blitz是好的,还是在我的情况下是无用的?



这些都是我迄今为止所考虑的可能性,但也许是最好的一个我还是另一个,所以不要犹豫,向我建议其他的东西。

非常感谢您对这个问题的帮助!



编辑



从非常有趣的答案和评论,一个很好的解决方案似乎是以下:




  • 结构粒子(包含6个双精度)或6个双精度的静态数组(这避免使用二维动态数组)

  • 使用此粒子结构的向量 deque 数组。



此外,我也可以使用迭代器遍历它们,使用 Blitz :: TinyVector< double,6> 而不是结构。

解决方案

首先,你不想分散一个给定粒子的坐标,所以我将开始写一个简单的 struct

  struct Particle {/ * coords * /}; 

然后我们可以创建一个简单的一维数组 Particles



我可能会使用 deque ,因为这是默认容器,尝试一个向量,这只是1.000.000的粒子意味着一个单一的几个MB的块。它应该持有,但是如果这种增长可能会使你的系统紧张,而 deque 将分配几个块。



警告::



如Alexandre C所说,如果你去 deque 从使用 operator [] 并且喜欢使用迭代样式。如果你真的需要随机访问,并且它的性能敏感,向量应该证明更快。


This is my little big question about containers, in particular, arrays.

I am writing a physics code that mainly manipulates a big (> 1 000 000) set of "particles" (with 6 double coordinates each). I am looking for the best way (in term of performance) to implement a class that will contain a container for these data and that will provide manipulation primitives for these data (e.g. instantiation, operator[], etc.).

There are a few restrictions on how this set is used:

  • its size is read from a configuration file and won't change during execution
  • it can be viewed as a big two dimensional array of N (e.g. 1 000 000) lines and 6 columns (each one storing the coordinate in one dimension)
  • the array is manipulated in a big loop, each "particle / line" is accessed and computation takes place with its coordinates, and the results are stored back for this particle, and so on for each particle, and so on for each iteration of the big loop.
  • no new elements are added or deleted during the execution

First conclusion, as the access on the elements is essentially done by accessing each element one by one with [], I think that I should use a normal dynamic array.

I have explored a few things, and I would like to have your opinion on the one that can give me the best performances.

As I understand there is no advantage to use a dynamically allocated array instead of a std::vector, so things like double** array2d = new ..., loop of new, etc are ruled out.

So is it a good idea to use std::vector<double> ?

If I use a std::vector, should I create a two dimensional array like std::vector<std::vector<double> > my_array that can be indexed like my_array[i][j], or is it a bad idea and it would be better to use std::vector<double> other_array and acces it with other_array[6*i+j].

Maybe this can gives better performance, especially as the number of columns is fixed and known from the beginning.

If you think that this is the best option, would it be possible to wrap this vector in a way that it can be accessed with a index operator defined as other_array[i,j] // same as other_array[6*i+j] without overhead (like function call at each access) ?

Another option, the one that I am using so far is to use Blitz, in particular blitz::Array:

typedef blitz::Array<double,TWO_DIMENSIONS> store_t;
store_t my_store;

Where my elements are accessed like that: my_store(line, column);.

I think there are not much advantage to use Blitz in my case because I am accessing each element one by one and that Blitz would be interesting if I was using operations directly on array (like matrix multiplication) which I am not.

Do you think that Blitz is OK, or is it useless in my case ?

These are the possibilities I have considered so far, but maybe the best one I still another one, so don't hesitate to suggest me other things.

Thanks a lot for your help on this problem !

Edit:

From the very interesting answers and comments bellow a good solution seems to be the following:

  • Use a structure particle (containing 6 doubles) or a static array of 6 doubles (this avoid the use of two dimensional dynamic arrays)
  • Use a vector or a deque of this particle structure or array. It is then good to traverse them with iterators, and that will allow to change from one to another later.

In addition I can also use a Blitz::TinyVector<double,6> instead of a structure.

解决方案

First of all, you don't want to scatter the coordinates of one given particle all over the place, so I would begin by writing a simple struct:

struct Particle { /* coords */ };

Then we can make a simple one dimensional array of these Particles.

I would probably use a deque, because that's the default container, but you may wish to try a vector, it's just that 1.000.000 of particles means about a single chunk of a few MBs. It should hold but it might strain your system if this ever grows, while the deque will allocate several chunks.

WARNING:

As Alexandre C remarked, if you go the deque road, refrain from using operator[] and prefer to use iteration style. If you really need random access and it's performance sensitive, the vector should prove faster.

这篇关于选择最高性能的容器(数组)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆