3D数组删除C ++的性能降低 [英] slow performance for 3D array delete C++

查看:51
本文介绍了3D数组删除C ++的性能降低的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

int newHeight = _height/2;
    int newWidth = _width/2;

    double*** imageData = new double**[newHeight];
    for (int i = 0; i < newHeight; i++)
    {
        imageData[i] = new double*[newWidth];
        for (int j = 0; j < newWidth; j++)
        {
            imageData[i][j] = new double[4];
        }
    }

我已经动态分配了这个3D矩阵. 释放内存的最快,最安全的方法是什么?

I have dynamically allocated this 3D matrix. what is the fastest and safest way to free the memory here?

这是我已经做的,但是这需要几秒钟,我的矩阵很大(1500,2000,4)

here is that I have done but this takes a few seconds my matrix is big (1500,2000,4)

  for (int i = 0; i != _height/2; i++)
        {
            for (int j = 0; j != _width/2; j++)
            {
                delete[] imageData[i][j];
            }
            delete[] imageData[i];
        }
        delete[] imageData;

更新
按照建议,我选择了以下解决方案:

Update
As suggested I have chosen this solution:

std::vector<std::vector<std::array<double,4>>>

对于我的情况来说,性能很好

the performance is great for my case

推荐答案

影响性能的算法的主要部分是分配的粒度和绝对数量.您总共生产的 3001501 细分为:

The primary portion of your algorithm that is killing performance is the granularity and sheer number of allocations you're making. In total you're producing 3001501 broken down as:

  • 为1500 double**
  • 分配1个
  • 1500个分配,每个分配获得2000 double*
  • 3000000个分配,每个分配获得double[4]
  • 1 allocation for 1500 double**
  • 1500 allocations, each of which obtains 2000 double*
  • 3000000 allocations each of which obtains double[4]

可以大大降低.您当然可以按照其他建议进行操作,只需分配1个庞大的double数组,即可将索引计算留给访问器函数.当然,如果要这样做,则需要确保随身携带尺寸.但是,结果将轻松实现最快的分配时间和访问性能.使用std::vector<double> arr(d1*d2*4);并根据需要进行偏移数学将非常有用.

This can be considerably reduced. You can certainly do as other suggest and simply allocate 1 massive array of double, leaving the index calculation to accessor functions. Of course, if you do that you need to ensure you bring the sizes along for the ride. The result, however, will easily deliver the fastest allocation time and access performance. Using a std::vector<double> arr(d1*d2*4); and doing the offset math as needed will serve very well.

另一种方式

如果您死于使用指针数组方法,则可以通过获取单个分配中的两个次要维度来消除3000000分配.您最劣等的维度是固定的(4),因此您可以执行以下操作:(但是稍后,您会发现还有更多以C ++为中心的机制):

If you are dead set on using a pointer array approach, you can eliminate the 3000000 allocations by obtaining both of the inferior dimensions in single allocations. Your most-inferior dimension is fixed (4), thus you could do this: (but you'll see in a moment there is a much more C++-centric mechanism):

double (**allocPtrsN(size_t d1, size_t d2))[4]
{
    typedef double (*Row)[4];
    Row *res = new Row[d1];

    for (size_t i=0; i<d1; ++i)
        res[i] = new T[d2][4];

    return res;
}

并简单地调用为:

double (**arr3D)[4] = allocPtrsN(d1,d2);

其中d1d2是您的两个高级尺寸.这样会产生精确的d1 + 1分配,第一个是d1指针,其余的是d1分配,每个double[d2][4]分配一个.

where d1 and d2 are your two superior dimensions. This produces exactly d1 + 1 allocations, the first being d1 pointers, the remaining be d1 allocations, one for each double[d2][4].

使用C ++标准容器

先前的代码显然是乏味的,并且坦率地说很容易出错. C ++使用固定数组矢量的向量提供了一个整洁的解决方案,做到这一点:

The prior code is obviously tedious, and frankly prone to considerable error. C++ offers a tidy solution this using a vector of vector of fixed array, doing this:

std::vector<std::vector<std::array<double,4>>> arr(1500, std::vector<std::array<double,4>>(2000));

最终,这将与之前显示的相当钝的代码实现几乎相同的分配技术,但同时为您提供标准库的所有可爱好处.您可以获得std::vectorstd::array模板的所有方便成员,并且RAII功能是附加的奖励.

Ultimately this will do nearly the same allocation technique as the rather obtuse code shown earlier, but provide you all the lovely benefits of the standard library while doing it. You get all those handy members of the std::vector and std::array templates, and RAII features as an added bonus.

但是,这是一个重要的区别.前面显示的原始指针方法将值初始化每个分配的实体.数组方法 will 的向量的向量.如果您认为这没什么...

However, this is one significant difference. The raw pointer method shown earlier will not value-initialize each allocated entity; the vector of vector of array method will. If you think it doesn't make a difference...

#include <iostream>
#include <vector>
#include <array>
#include <chrono>

using Quad = std::array<double, 4>;
using Table = std::vector<Quad>;
using Cube = std::vector<Table>;

Cube allocCube(size_t d1, size_t d2)
{
    return Cube(d1, Table(d2));
}

double ***allocPtrs(size_t d1, size_t d2)
{
    double*** ptrs = new double**[d1];
    for (size_t i = 0; i < d1; i++)
    {
        ptrs[i] = new double*[d2];
        for (size_t j = 0; j < d2; j++)
        {
            ptrs[i][j] = new double[4];
        }
    }
    return ptrs;
}

void freePtrs(double***& ptrs, size_t d1, size_t d2)
{
    for (size_t i=0; i<d1; ++i)
    {
        for (size_t j=0; j<d2; ++j)
            delete [] ptrs[i][j];
        delete [] ptrs[i];
    }
    delete [] ptrs;
    ptrs = nullptr;
}

double (**allocPtrsN(size_t d1, size_t d2))[4]
{
    typedef double (*Row)[4];
    Row *res = new Row[d1];

    for (size_t i=0; i<d1; ++i)
        res[i] = new double[d2][4];

    return res;
}

void freePtrsN(double (**p)[4], size_t d1, size_t d2)
{
    for (size_t i=0; i<d1; ++i)
        delete [] p[i];
    delete [] p;
}

std::vector<std::vector<std::array<double,4>>> arr(1500, std::vector<std::array<double,4>>(2000));

template<class C>
void print_duration(const std::chrono::time_point<C>& beg,
                    const std::chrono::time_point<C>& end)
{
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - beg).count() << "ms\n";
}

int main()
{
    using namespace std::chrono;
    time_point<system_clock> tp;
    volatile double vd;

    static constexpr size_t d1 = 1500, d2 = 2000;

    tp = system_clock::now();
    for (int i=0; i<10; ++i)
    {
        double ***cube = allocPtrs(d1,d2);
        cube[d1/2][d2/21][1] = 1.0;
        vd = cube[d1/2][d2/2][3];
        freePtrs(cube, 1500, 2000);
    }
    print_duration(tp, system_clock::now());

    tp = system_clock::now();
    for (int i=0; i<10; ++i)
    {
        Cube cube = allocCube(1500,2000);
        cube[d1/2][d2/21][1] = 1.0;
        vd = cube[d1/2][d2/2][3];
    }
    print_duration(tp, system_clock::now());

    tp = system_clock::now();
    for (int i=0; i<10; ++i)
    {
        auto cube = allocPtrsN(d1,d2);
        cube[d1/2][d2/21][1] = 1.0;
        vd = cube[d1/2][d2/21][1];
        freePtrsN(cube, d1, d2);
    }
    print_duration(tp, system_clock::now());
}

输出

5328ms
418ms
95ms

因此,如果您打算在每个元素上加载除零以外的任何内容,那么请牢记这一点.

Thusly, if you're planning on loading up every element with something besides zero anyway, it is something to keep in mind.

结论

如果性能至关重要,我将使用24MB(无论如何在我的实现中)(可能在std::vector<double> arr(d1*d2*4);中)单分配,并根据需要使用一种形式的辅助索引或另一种形式进行偏移量计算.其他答案对此提供了有趣的想法,特别是Ben's,它从根本上减少了两个三个块(数据和两个辅助指针数组)的分配计数.抱歉,我没有时间进行测试,但是我怀疑性能会很出色.但是,如果您真的想保留现有技术,请考虑在C ++容器中进行操作,如上所示.如果花更多的时间花在初始化世界上的代价不是太沉重的代价,它将更容易管理(与原始指针相比,显然要处理的代码更少).

If performance were critical I would use the 24MB (on my implementation, anyway) single-allocation, likely in a std::vector<double> arr(d1*d2*4);, and do the offset calculations as needed using one form of secondary indexing or another. Other answers proffer up interesting ideas on this, notably Ben's, which radically reduces the allocation count two a mere three blocks (data, and two secondary pointer arrays). Sorry, I didn't have time to bench it, but I would suspect the performance would be stellar. But if you really want to keep your existing technique, consider doing it in a C++ container as shown above. If the extra cycles spent value initializing the world aren't too heavy a price to pay, it will be much easier to manage (and obviously less code to deal with in comparison to raw pointers).

祝你好运.

这篇关于3D数组删除C ++的性能降低的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆