设计高性能3D动态数组 [英] design high-performance 3d dynamic array

查看:81
本文介绍了设计高性能3D动态数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好:

我正在研究一个科学的计算程序,它需要一个非常
高性能三维动态阵列.我试过了
std :: vector< vector< vector< T> > > ;、 boost :: multi_array和blitz :: Array,
但是所有这些库的性能都无法满足我的需求
需求.因此,我设计了自己的数组类,其中的片段
如下:

template< class T>
类数组
{
公众:
数组(size_t d1,size_t d2,size_t d3)
{
p = new T ** [d1];
for(size_t i = 0; i!= d1; ++ i)
p [i] = new T * [d2];
for(size_t i = 0; i!= d1; ++ i)
for(size_t j = 0; j!= d2; ++ j)
p [i] [j] = new T [d3];

对于(size_t i = 0; i!= d1; ++ i)
对于(size_t j = 0; j!= d2; ++ j)
对于(size_t k = 0; k!= d3; ++ k)
p [i] [j] [k] = T();
}

公众:
夯;运算符()(size_t d1,size_t d2,size_t d3)
{
返回p [d1] [d2] [d3];
}
私人:
T *** p;
};


但是,由于我必须读写数组元素以获取更多信息
在计算中超过一千万次,重载的括号
会大大降低效率.为了提高性能,
一种方法是公开T *** p并使用array.p [i] [j] [k]到
读/写数据.但是,我认为这种设计是可怕的.

因此,有谁有更好的主意从
隐藏内部指针p 用户是否不降低效率?还是有人有完整的
高性能三维动态阵列的新想法?

预先感谢!

Hi, everyone:

I am working on a scientific computation program, which needs a very
high-performance three dimensional dynamic array. I tried
std::vector<vector<vector<T> > >, boost::multi_array and blitz::Array,
but the performance of all these libraries couldn''t satisfy my
needs. Therefore, I designed my own array class, the fragment of which
is as follows:

template<class T>
class array
{
public:
array(size_t d1,size_t d2,size_t d3)
{
p=new T**[d1];
for(size_t i=0;i!=d1;++i)
p[i]=new T*[d2];
for(size_t i=0;i!=d1;++i)
for(size_t j=0;j!=d2;++j)
p[i][j]=new T[d3];

for (size_t i=0;i!=d1;++i)
for (size_t j=0;j!=d2;++j)
for (size_t k=0;k!=d3;++k)
p[i][j][k]=T();
}

public:
T& operator()(size_t d1,size_t d2,size_t d3)
{
return p[d1][d2][d3];
}
private:
T ***p;
};


However, since I have to read and write the array elements for more
than ten million times in the computation, the overloaded parentheses
will decrease the efficiency dramatically. To improve the performance,
one way is to make T ***p public and using array.p[i][j][k] to
read/write data. However, I think this design is hideous.

So, does anyone have a better idea to hide the internal pointer p from
users without decrease the efficiency? Or, does anyone have a complete
new idea of a high-performance three-dimensional dynamic array?

Thanks in advance!

推荐答案

与其他答案提到的一样,您可以对单维数组使用指针算法来模拟三维数组.只是以为我会补充说,可以提高对该数组建立​​索引的性能.无需使用乘法来计算索引,而是可以将给定索引的偏移量的预先计算的值存储到每个维度中.实际上,您正在存储乘法结果.当3-D数组被分配或调整大小时,您将创建这些偏移数组.要获取指针偏移量,您可以执行以下操作:
Like the other answer mentions, you can use pointer arithmetic with a single-dimension array to simulate a three-dimensional array. Just thought I''d add that the performance of indexing that array can be improved. Rather than using multiplication to calculate the index, you can store pre-calculated values for offsets into each dimension for a given index. Effectively, you are storing the result of multiplications. You would create these offset arrays when the 3-D array gets allocated or resized. To get the pointer offset, you''d do something like this:
int offset = zOffset[z] + yOffset[y] + x;


这些数组将使用如下代码填充:


Those arrays would be filled using code that would go something like this:

for(int i = 0; i < yMax; i++)
{
    yOffset[i] = i * xMax;
}
int xyMax = xMax * yMax;
for(int i = 0; i < zMax; i++)
{
    zOffset[i] = i * xyMax;
}


这样,您将执行数组查找而不是乘法.是否快于乘法取决于特定计算机是否可以更快地执行数组查找或乘法.


That way, you perform an array lookup rather than a multiplication. Whether or not this is faster than a multiplication depends on if the specific computer can do array lookups or multiplications faster.


您可以使 operator()一个内联函数.


重载的operator()可能会内联扩展,因此它可能不是您认为的性能问题.你检查了吗?

作为魔鬼的拥护者,array.p [i] [j] [k]方法的优点是可以使用数组语法而不是函数语法进行编写.同样,除了公开内部指针之外,还应该提供一个内联的getter方法,该方法将返回T *** const,这样外部代码实际上就无法更改您的指针.


我的想法是做这种旧风格".它应该接近编译器为内置3d数组生成的内容.您还可以在列主顺序和行主顺序之间进行选择,这在某些情况下可能会有所不同.

您的片段将被这样修改:

The overloaded operator() is likely to be expanded in-line, so it may not be the performance issue you believe. Have you checked on this?

To be a devil''s advocate, the array.p[i][j][k] approach has the advantage of letting you write with array syntax rather than function syntax. Also instead of making your internal pointer public, you should provide an inline getter that would return a T*** const, so the outside code couldn''t actually alter your pointer.


My thought is to do this "old style". It should be close to what the compiler would produce for a built-in 3d array. You could also choose between column major and row major order, which might make a difference in some situations.

Your fragment would be revised like so:

template<class T>
class array
{
public:
    array( size_t d1, size_t d2, size_t d3) : d1_(d1), d2_(d2), d3_(d3)
    {
        p = new T[ d1 * d2 * d3 ];
    }
public:
    T& operator()( size_t x, size_t y, size_t z )
    {
        return p + (( x * d2_ + y ) * d3_ + z );
        //return p + ( x + d1_ * ( y + z * d2_ )); // alternative
    }

private:
    T *p;
    size_t d1_;
    size_t d2_;
    size_t d3_;
};




附录:

在这里应用表查询从未发生过.谢谢您的想法,aspdotnetdev.

注意,数组查找通常可以实现为乘法和加法.对于3维数组索引操作,无论是OP,aspdotnetdev还是我的方法,我都会计算3乘和3加.但是,根据T的大小,这些方法中的一种很可能可以用这些方法中的每一种的位移来代替.此外,对于OP或aspdotnetdev的建议,其他两个乘数也可以转换为班次,但不是我的.因此,对于数组索引计算,我的建议是当位移位比乘法快时的失败者.

另一方面,出于参考的目的,我认为我的是最好的,而OP是最差的.这可能会产生重大影响,但取决于平台和使用方式.确定该值的唯一方法是通过测量实际性能.


也许是时候进行一些分析了吗?




Addendum:

Applying table lookup here never occurred to me. Thank you for the idea, aspdotnetdev.

Note that an array lookup might typically be implemented as a multiply and an add. I count 3 multiplies and 3 adds for a 3-d array indexing operation whether it is the OP''s, aspdotnetdev''s, or my approach. However, depending on the size of T, one of those might very well be replaceable by a bit shift in each of those approaches. In addition, the other 2 multiplies could also be turned into shifts for either the OP''s or aspdotnetdev''s suggestion, but not mine. So for array index calculations my suggestion is the loser when bit shifts are faster than multiplies.

On the other hand, for locality of reference, I believe that mine is best and the OP''s the worst. This can have a significant impact, but depends on the platform and how things are used. The only way to determine that would be by measuring actual performance.


Maybe it is time to do some profiling?


这篇关于设计高性能3D动态数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆