一维或二维数组,什么是快? [英] 1D or 2D array, what's faster?

查看:143
本文介绍了一维或二维数组,什么是快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要的是重新$ P $的psenting一个2D场(轴x,y)和我面临一个问题:我应该使用一维数组或二维数组

I'm in need of representing a 2D field (axes x, y) and I face a problem: Should I use an 1D array or a 2D array?

我可以想像,这对于一维数组重新计算指数(Y + X * N)可能比使用二维数组(X,Y)慢,但我可以形象1D可以在CPU缓存。

I can imagine, that recalculating indices for 1D arrays (y + x*n) could be slower than using 2D array (x, y) but I could image that 1D could be in CPU cache..

我做了一些google搜索,但只找到关于静态数组页面(并指出,一维和二维基本相同)。但我的数组必须我充满活力。

I did some googling, but only found pages regarding static array (and stating that 1D and 2D are basically the same). But my arrays must me dynamic.

那么,什么是


  1. 更快,

  2. 小(RAM)

动态一维数组或动态二维数组?

dynamic 1D arrays or dynamic 2D arrays?

感谢:)

推荐答案

注意:一是不能挖成动态的比较1D或2D动态存储模式没有因为code的性能尽显本本的时候影响性能的细节是依赖一个非常大的数目的参数。简介如果可能的话。

有关稠密矩阵的一维的方法很可能是更快,因为它提供了更好的存储器局部性和少分配和释放的开销。

For dense matrices the 1D approach is likely to be faster since it offers better memory locality and less allocation and deallocation overhead.

动态-1D消耗比二维的方法更少的内存。后者也需要更多的分配。

Dynamic-1D consumes less memory than the 2D approach. The latter also requires more allocations.

我摆好了几个原因之下的pretty长的答案,但我想对您的假设一些言论第一。

I laid out a pretty long answer beneath with several reasons but I want to make some remarks on your assumptions first.

我可以想像,这对于一维数组重新计算指数(Y + X * N)可能比使用二维数组(X,Y)

I can imagine, that recalculating indices for 1D arrays (y + x*n) could be slower than using 2D array (x, y)

让我们比较这两个功能:

Let's compare these two functions:

int get_2d (int **p, int r, int c) { return p[r][c]; }
int get_1d (int *p, int r, int c)  { return p[c + C*r]; }

(非内联)由Visual Studio 2015年RC为这些功能(与优化开启)组件产生的是:

The (non-inlined) assembly generated by Visual Studio 2015 RC for those functions (with optimizations turned on) is:

?get_1d@@YAHPAHII@Z PROC
push    ebp
mov ebp, esp
mov eax, DWORD PTR _c$[ebp]
lea eax, DWORD PTR [eax+edx*4]
mov eax, DWORD PTR [ecx+eax*4]
pop ebp
ret 0

?get_2d@@YAHPAPAHII@Z PROC
push ebp
mov ebp, esp
mov ecx, DWORD PTR [ecx+edx*4]
mov eax, DWORD PTR _c$[ebp]
mov eax, DWORD PTR [ecx+eax*4]
pop ebp
ret 0

所不同的是 MOV (2D)与 LEA (1D)。
前者具有3个周期的等待时间和为2的最大吞吐量每个周期,而后者具有2个周期的等待时间为3,最大吞吐量每个周期。 (据指令表 - 瓦格纳雾
由于差异是次要的,我觉得不应该有从指数重新计算而产生很大的性能差异。我希望它是不太可能,以确定这种差异本身是在任何程序中的瓶颈。

The difference is mov (2d) vs. lea (1d). The former has a latency of 3 cycles and a a maximum throughput of 2 per cycle while the latter has a latency of 2 cycles and a maximum throughput of 3 per cycle. (According to Instruction tables - Agner Fog Since the differences are minor, I think there should not be a big performance difference arising from index recalculation. I expect it to be very unlikely to identify this difference itself to be the bottleneck in any program.

这给我们带来下一个(和更有趣的)点:

This brings us to the next (and more interesting) point:

...但我可以形象1D可以在CPU缓存...

... but I could image that 1D could be in CPU cache ...

没错,但2D可能是在CPU缓存,太。见的的缺点:内存位置的一个解释,为什么1D仍然是更好的。

True, but 2d could be in CPU cache, too. See The Downsides: Memory locality for an explanation why 1d is still better.

注:这是关于动态数组/分配方案[的malloc /新/向量等。静态二维数组是一个连续的内存块,并因此不受我要去present这里的缺点。

要能够理解为什么动态数组的动态数组或向量的载体是最有可能不是选择的数据存储模式,您需要了解此类结构的内存布局。

To be able to understand why a dynamic array of dynamic arrays or a vector of vectors is most likely not the data storage pattern of choice, you are required to understand the memory layout of such structures.

int main (void)
{
    // allocate memory for 4x4 integers; quick & dirty
    int ** p = new int*[4];
    for (size_t i=0; i<4; ++i) p[i] = new int[4]; 

    // do some stuff here, using p[x][y] 

    // deallocate memory
    for (size_t i=0; i<4; ++i) delete[] p[i];
    delete[] p;
}

的缺点

内存局部性

对于这种矩阵你分配的四个指针一个街区,四个整数四大块。的所有分配无关的,因此可以导致任意内存位置。

The downsides

Memory locality

For this "matrix" you allocate one block of four pointers and four blocks of four integers. All of the allocations are unrelated and can therefore result in an arbitrary memory position.

下面的图片会给你的内存可能什么样子的想法。

The following image will give you an idea of how the memory may look like.


  • 紫罗兰广场是 P 占用的内存位置本身。

  • 绿色广场组装内存区域 P 点(4×为int * )。

  • 的4个连续的蓝色正方形的4个地区是那些指向每个为int * 绿色区域的

  • The violet square is the memory position occupied by p itself.
  • The green squares assemble the memory region p points to (4 x int*).
  • The 4 regions of 4 contiguous blue squares are the ones pointed to by each int* of the green region

这意味着(在使用本),你可能会看到比连续存储模式的表现更差,由于高速缓存的实例。

This means that (when using this) you will probably observe worse performance than for a contiguous storage pattern, due to caching for instance.

假设缓存行是转移到高速缓存一次的数据量,让我们想象一个程序后​​,一个又一个元素访问整个矩阵。

Let's say a cache line is "the amount of data transfered into the cache at once" and let's imagine a program accessing the whole matrix one element after another.

如果你有一个正确对齐4次4矩阵32比特值的,具有64字节高速缓存线(典型值)的处理器是能够一次性的数据(4 * 4 * 4 = 64字节) 。
如果你开始处理和数据是不是已经在缓存中,你会面临一个高速缓存未命中,数据会从主内存中获取。因为它适合高速缓存行,当且仅当它被连续地存储(和适当对准)该负载可以一次读取整个矩阵。
有可能不会再有失误,而处理这些数据。

If you have a properly aligned 4 times 4 matrix of 32 bit values, a processor with a 64 byte cache line (typical value) is able to "one-shot" the data (4*4*4 = 64 bytes). If you start processing and the data isn't already in the cache you'll face a cache miss and the data will be fetched from main memory. This load can fetch the whole matrix at once since it fits into a cache line, if and only if it is contiguously stored (and properly aligned). There will probably not be any more misses while processing that data.

在一个动态的,真实的二维与每行/列的无关的位置系统的情况下,该处理器需要单独加载每个存储器位置。
Eventhough只需要64个字节,装载4无关存储位置4的高速缓存线将-in最坏情况scenario-实际传输256字节和浪费75%的吞吐量带宽。
如果使用2D-方案处理这些数据,你会再一次(如果尚未缓存)所面临的第一个元素高速缓存未命中。
但现在,只有第一行/柱将在从主存储器中的第一负载后高速缓存,因为所有其他行均位于别处在内存中,不相邻的第一个。
只要你达到一个新的行/列有将再次进行高速缓存未命中,并从主内存的下一个负载。

In case of a dynamic, "real two-dimensional" system with unrelated locations of each row/column, the processor needs to load every memory location seperately. Eventhough only 64 bytes are required, loading 4 cache lines for 4 unrelated memory positions would -in a worst case scenario- actually transfer 256 bytes and waste 75% throughput bandwidth. If you process the data using the 2d-scheme you'll again (if not already cached) face a cache miss on the first element. But now, only the first row/colum will be in the cache after the first load from main memory because all other rows are located somewhere else in memory and not adjacent to the first one. As soon as you reach a new row/column there will again be a cache miss and the next load from main memory is performed.

长话短说:二维图形具有高速缓存未命中与1D方案提供更高的机会获得更好的性能潜力,由于数据的地方

Long story short: The 2d pattern has a higher chance of cache misses with the 1d scheme offering better potential for performance due to locality of the data.


  • 多达 N + 1 (4 + 1 = 5)分配(使用新的malloc,分配器分配::或其他)是必要的创建所需N×M个(4×4)矩阵。

  • 必须应用以及相同数量的正确,各自释放的操作。

  • As many as N + 1 (4 + 1 = 5) allocations (using either new, malloc, allocator::allocate or whatever) are necessary to create the desired NxM (4×4) matrix.
  • The same number of proper, respective deallocation operations must be applied as well.

因此​​,它是更昂贵创建在对比一个单一的分配方案/复制这样的矩阵

Therefore, it is more costly to create/copy such matrices in contrast to a single allocation scheme.

这与越来越多的行变得更差。

我会asumme大小为INT 32位,三分球为32位。 (注:系统依赖)

让我们记住:我们要存储一个4×4矩阵INT这意味着64字节

Let's remember: We want to store a 4×4 int matrix which means 64 bytes.

对于N×M的矩阵,存储与presented指针到指针我们消耗方案

For a NxM matrix, stored with the presented pointer-to-pointer scheme we consume


  • N * M *的sizeof(INT) [实际的蓝色数据] +

  • N * sizeof的为(int *) [绿球] +

  • 的sizeof(INT **) [紫变量p]字节。

  • N*M*sizeof(int) [the actual blue data] +
  • N*sizeof(int*) [the green pointers] +
  • sizeof(int**) [the violet variable p] bytes.

这使得 4 * 4 * 4 + 4 * 4 + 4 = 84 在present例子的情况下,字节,它使用的时候变得更糟的std ::矢量&lt;的std ::矢量&lt;&INT GT;&GT;
这将需要 N * M *的sizeof(INT) + N * sizeof运算(矢量&lt; INT&GT;) + 的sizeof(矢量&lt;矢量&lt; INT&GT;&GT;)字节,也就是 4 * 4 * 4 + 4 * 16 + 16 = 144 字节总数,这一翻译的64个字节为4×4 INT。

That makes 4*4*4 + 4*4 + 4 = 84 bytes in case of the present example and it gets even worse when using std::vector<std::vector<int>>. It will require N * M * sizeof(int) + N * sizeof(vector<int>) + sizeof(vector<vector<int>>) bytes, that is 4*4*4 + 4*16 + 16 = 144 bytes in total, intead of 64 bytes for 4 x 4 int.

此外-depending所使用的allocator-每个单独的分配完全可能(而且很可能会)有另一个16字节的内存开销。 (一些Infobytes,它存储用于正确重新分配的目的分配的字节数。)

In addition -depending on the used allocator- each single allocation may well (and most likely will) have another 16 bytes of memory overhead. (Some "Infobytes" which store the number of allocated bytes for the purpose of proper deallocation.)

这意味着最糟糕的情况是:

This means the worst case is:

N *(16 + M *的sizeof(int)的)+ 16 + N * sizeof的为(int *)+的sizeof(INT **)结果
   = 4 *(16 + 4 * 4)+ 16 + 4 * 4 + 4 = 164字节! _Overhead:156%_

的开销中所占的份额将减少为基体的大小增长,但仍将是present。

分配的一堆需要一个适当的异常,以避免内存泄漏,如果分配的一个会失败处理!
你需要保持分配的内存块的跟踪和释放内存时,你不应该忘记他们。

The bunch of allocations requires an appropriate exception handling in order to avoid memory leaks if one of the allocations will fail! You’ll need to keep track of allocated memory blocks and you must not forget them when deallocating the memory.

如果的内存和下一行的运行不能被分配​​(特别是有可能发生的基质是非常大的),一个的std :: bad_alloc 被抛出

If new runs of of memory and the next row cannot be allocated (especially likely when the matrix is very large), a std::bad_alloc is thrown by new.

示例:

Example:

在提到新上/删除的例子中,我们将面临更多code。如果我们要避免在 bad_alloc 例外的情况下,泄漏。

In the above mentioned new/delete example, we'll face some more code if we want to avoid leaks in case of bad_alloc exceptions.

  // allocate memory for 4x4 integers; quick & dirty
  size_t const N = 4;
  // we don't need try for this allocation
  // if it fails there is no leak
  int ** p = new int*[N];
  size_t allocs(0U);
  try 
  { // try block doing further allocations
    for (size_t i=0; i<N; ++i) 
    {
      p[i] = new int[4]; // allocate
      ++allocs; // advance counter if no exception occured
    }
  }
  catch (std::bad_alloc & be)
  { // if an exception occurs we need to free out memory
    for (size_t i=0; i<allocs; ++i) delete[] p[i]; // free all alloced p[i]s
    delete[] p; // free p
    throw; // rethrow bad_alloc
  }
  /*
     do some stuff here, using p[x][y] 
  */
  // deallocate memory accoding to the number of allocations
  for (size_t i=0; i<allocs; ++i) delete[] p[i];
  delete[] p;

摘要

有情况下,真实二维存储器布局配合和意义(即,如果每行的列的数目是不恒定的),但在最简单的和共同的2D数据存储情况下,它们只是膨胀的你的$ C的复杂性$ C,降低程序的性能和内存效率。

Summary

There are cases where "real 2d" memory layouts fit and make sense (i.e. if the number of columns per row is not constant) but in the most simple and common 2D data storage cases they just bloat the complexity of your code and reduce the performance and memory efficiency of your program.

您应该用一个连续的内存块,你的行映射到该块。

You should use a contiguous block of memory and map your rows onto that block.

做它的C ++的方式可能是编写管理你的记忆,同时考虑像

The "C++ way" of doing it is probably to write a class that manages your memory while considering important things like

  • What is The Rule of Three?
  • What is meant by Resource Acquisition is Initialization (RAII)?
  • C++ concept: Container (on cppreference.com)

要提供这样一类怎么可能看起来像一个想法,这里有一个简单的例子,有一些基本特征:

To provide an idea of how such a class may look like, here's a simple example with some basic features:


  • 2D-尺寸构造的

  • 2D调整大小

  • 运营商(为size_t,为size_t)的2D-行的主要元素访问

  • 在(为size_t,为size_t)托运2D行主要元素访问

  • 满足了概念要求的集装箱

  • 2d-size-constructible
  • 2d-resizable
  • operator(size_t, size_t) for 2d- row major element access
  • at(size_t, size_t) for checked 2d-row major element access
  • Fulfills Concept requirements for Container

来源:

#include <vector>
#include <algorithm>
#include <iterator>
#include <utility>

namespace matrices
{

  template<class T>
  class simple
  {
  public:
    // misc types
    using data_type  = std::vector<T>;
    using value_type = typename std::vector<T>::value_type;
    using size_type  = typename std::vector<T>::size_type;
    // ref
    using reference       = typename std::vector<T>::reference;
    using const_reference = typename std::vector<T>::const_reference;
    // iter
    using iterator       = typename std::vector<T>::iterator;
    using const_iterator = typename std::vector<T>::const_iterator;
    // reverse iter
    using reverse_iterator       = typename std::vector<T>::reverse_iterator;
    using const_reverse_iterator = typename std::vector<T>::const_reverse_iterator;

    // empty construction
    simple() = default;

    // default-insert rows*cols values
    simple(size_type rows, size_type cols)
      : m_rows(rows), m_cols(cols), m_data(rows*cols)
    {}

    // copy initialized matrix rows*cols
    simple(size_type rows, size_type cols, const_reference val)
      : m_rows(rows), m_cols(cols), m_data(rows*cols, val)
    {}

    // 1d-iterators

    iterator begin() { return m_data.begin(); }
    iterator end() { return m_data.end(); }
    const_iterator begin() const { return m_data.begin(); }
    const_iterator end() const { return m_data.end(); }
    const_iterator cbegin() const { return m_data.cbegin(); }
    const_iterator cend() const { return m_data.cend(); }
    reverse_iterator rbegin() { return m_data.rbegin(); }
    reverse_iterator rend() { return m_data.rend(); }
    const_reverse_iterator rbegin() const { return m_data.rbegin(); }
    const_reverse_iterator rend() const { return m_data.rend(); }
    const_reverse_iterator crbegin() const { return m_data.crbegin(); }
    const_reverse_iterator crend() const { return m_data.crend(); }

    // element access (row major indexation)
    reference operator() (size_type const row,
      size_type const column)
    {
      return m_data[m_cols*row + column];
    }
    const_reference operator() (size_type const row,
      size_type const column) const
    {
      return m_data[m_cols*row + column];
    }
    reference at() (size_type const row, size_type const column)
    {
      return m_data.at(m_cols*row + column);
    }
    at operator() (size_type const row, size_type const column) const
    {
      return m_data.at(m_cols*row + column);
    }

    // resizing
    void resize(size_type new_rows, size_type new_cols)
    {
      // new matrix new_rows times new_cols
      simple tmp(new_rows, new_cols);
      // select smaller row and col size
      auto mc = std::min(m_cols, new_cols);
      auto mr = std::min(m_rows, new_rows);
      for (size_type i(0U); i < mr; ++i)
      {
        // iterators to begin of rows
        auto row = begin() + i*m_cols;
        auto tmp_row = tmp.begin() + i*new_cols;
        // move mc elements to tmp
        std::move(row, row + mc, tmp_row);
      }
      // move assignment to this
      *this = std::move(tmp);
    }

    // size and capacity
    size_type size() const { return m_data.size(); }
    size_type max_size() const { return m_data.max_size(); }
    bool empty() const { return m_data.empty(); }
    // dimensionality
    size_type rows() const { return m_rows; }
    size_type cols() const { return m_cols; }
    // data swapping
    void swap(simple &rhs)
    {
      using std::swap;
      m_data.swap(rhs.m_data);
      swap(m_rows, rhs.m_rows);
      swap(m_cols, rhs.m_cols);
    }
  private:
    // content
    size_type m_rows{ 0u };
    size_type m_cols{ 0u };
    data_type m_data{};
  };
  template<class T>
  void swap(simple<T> & lhs, simple<T> & rhs)
  {
    lhs.swap(rhs);
  }
  template<class T>
  bool operator== (simple<T> const &a, simple<T> const &b)
  {
    if (a.rows() != b.rows() || a.cols() != b.cols())
    {
      return false;
    }
    return std::equal(a.begin(), a.end(), b.begin(), b.end());
  }
  template<class T>
  bool operator!= (simple<T> const &a, simple<T> const &b)
  {
    return !(a == b);
  }

}

这里需要注意几件事情:


  • T 需要满足的使用的std ::矢量成员函数的要求

  • 操作符()不做的范围内检查任何

  • 无需对自己
  • 管理数据
  • 没有析构函数,拷贝构造函数需要或赋值操作符

  • T needs to fulfill the requirements of the used std::vector member functions
  • operator() doesn't do any "of of range" checks
  • No need to manage data on your own
  • No destructor, copy constructor or assignment operators required

所以,你不必理会正确处理内存为每个应用程序,但只有一次,你写的类。

So you don't have to bother about proper memory handling for each application but only once for the class you write.

有可能会在那里一个动态的真正的二维结构是有利的情况下。这是为实例的情况下,如果

There may be cases where a dynamic "real" two dimensional structure is favourable. This is for instance the case if


  • 矩阵是非常大而稀疏(如果任何行甚至不需要进行分配,但可以使用一个nullptr处理),或者

  • 的行不具有相同的列数(即如果你没有在所有的矩阵,但另一个二维构造)。

这篇关于一维或二维数组,什么是快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆