优化raw new [] / delete [] vs std :: vector [英] Optimization of raw new[]/delete[] vs std::vector

查看：208 发布时间：2016/10/23 14:47:34 c++ vector c++14 compiler-optimization

本文介绍了优化raw new [] / delete [] vs std :: vector的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

让我们来讨论非常基本的动态分配的内存。我们使用3的向量，设置其元素并返回向量的总和。

在第一个测试用例中，我使用了一个原始指针 new [] / delete [] 。在第二个我使用 std :: vector ：

  #include< ; vector> 
 
 int main（）
 {
 // int * v = new int [3]; //（1）
 auto v = std :: vector< int>（3）; //（2）
 
 
 for（int i = 0; i <3; ++ i）
 v [i] = i + 1; 
 
 int s = 0; 
 for（int i = 0; i <3; ++ i）
 s + = v [i]; 
 
 // delete [] v; //（1）
 return s; 
}

程序集（1）（ new [] / delete [] ）

  ＃@main 
 mov eax，6 
 ret

（ std :: vector ）

  main：＃@main 
 push rax 
 mov edi，12 
 call operator new（unsigned long）
 mov qword ptr [rax]，0 
 movabs rcx，8589934593 
 mov qword ptr [rax]，rcx 
 mov dword ptr [rax + 8]，3 
 test rax，rax 
 je .LBB0_2 
 mov rdi，rax 
 call operator delete void *）
 .LBB0_2：＃％std :: vector< int，std :: allocator< int> > :: vector（）[clone .exit] 
 mov eax，6 
 pop rdx 
 ret

两个输出取自 https://gcc.godbolt.org/与 -std = c ++ 14 -O3

在两个版本中，返回的值都是在编译时计算的所以我们看到 mov eax，6; 。

 
 
 < delete []

动态分配已完全删除。但是， std :: vector 会分配，设置和释放内存。

未使用的变量 auto v = std :: vector< int>（3）：调用 new ，设置内存，然后调用 delete 。

我意识到这很可能是一个几乎不可能的答案给予，但也许有人有一些见解，一些有趣的答案可能会弹出。

什么是不允许编译器优化删除内存在 std :: vector 情况下，像在原始内存分配情况下一样分配

当使用指向动态分配的数组的指针时（直接使用new []和delete []），编译器优化了 operator new 和 operator delete ，即使它们有可观察到的副作用。这种优化是由C ++标准第5.3.4节第10段允许的：

允许实现省略对可替换全局的调用
分配函数（18.6.1.1，18.6.1.2）。当它这样做，存储
，而是由实现或...

我会显示其余

此优化是相对较新的，因为它首次允许在C ++ 14中（提议 N3664 ）。 Clang自3.4以来一直支持。最新版本的gcc，即5.3.0，不利用这种放松的as-if规则。它产生以下代码：

  main：
 sub rsp，8 
 mov edi，12 
 call operator new []（unsigned long）
 mov DWORD PTR [rax]，1 
 mov DWORD PTR [rax + 4]，2 
 mov rdi，rax 
 mov DWORD PTR [rax + 8]，3 
调用操作符delete []（void *）
 mov eax，6 
 add rsp，8 
 ret

MSVC 2013也不支持此优化。它产生以下代码：

  main：
 sub rsp，28h 
 mov ecx，0Ch 
 call operator new []（）
 mov rcx，rax 
 mov dword ptr [rax]，1 
 mov dword ptr [rax + 4]，2 
 mov dword ptr [rax + 8]，3 
 call operator delete []（）
 mov eax，6 
 add rsp，28h 
 ret 
  
 
 
 我目前无法访问MSVC 2015 Update 1，因此我不知道它是否支持此优化。
 
 
 最后，这是由icc 13.0.1生成的汇编代码：
  main： 
 push rbp 
 mov rbp，rsp 
 and rsp，-128 
 sub rsp，128 
 mov edi，3 
 call __intel_new_proc_init 
 stmxcsr DWORD PTR [rsp] 
 mov edi，12 
或DWORD PTR [rsp]，32832 
 ldmxcsr DWORD PTR [rsp] 
调用操作符new [] $ b mov rdi，rax 
 mov DWORD PTR [rax]，1 
 mov DWORD PTR [4 + rax]，2 
 mov DWORD PTR [8 + rax]，3 
调用操作符delete []（void *）
 mov eax，6 
 mov rsp，rbp 
 pop rbp 
 ret 
  
显然，它不支持这种优化。我无法访问最新版本的icc，即16.0。
 
 
 所有这些代码段都是在启用优化后生成的。
 
 
 当使用 std :: vector 时，所有这些编译器没有优化分配。当编译器不执行优化时，它是因为它不能由于某种原因或它只是还不支持。
 
 促成因素不允许编译器
优化在std :: vector情况下删除内存分配，
像原始内存分配情况一样？
 
 
编译器没有执行优化，因为它不允许。为了看到这一点，让我们从5.3.4中看到第10段的其余部分：
 
 允许实现省略调用一个可替换的全局
分配函数（18.6.1.1，18.6.1.2）。当它这样做时，存储
而是由实现提供或通过扩展另一个new-expression 的
分配提供。
 
 < blockquote> 
 
 这就是说，你可以省略一个可替换的全局分配函数的调用，只有当它来自一个new-expression。 
 
 
 以下表达式
  new int [3] 
  
是一个新表达式，因此编译器允许优化掉相关的分配函数调用。
 
 
 另一方面，下面的表达式：
  :: operator new（12）
  
不是新表达式（见5.3.4第1段）。这只是一个函数调用表达式。换句话说，这被视为典型的函数调用。此函数不能被优化，因为它从另一个共享库导入（即使你静态链接运行时，函数本身调用另一个导入的函数）。
 
 
 默认分配器使用 std :: vector 使用 :: operator new 分配内存，因此编译器不允许将其优化。 
 
 
 让我们来测试一下。这里是代码：
  int main（）
 {
 int * v =（int *）： ：operator new（12）; 
 
 for（int i = 0; i <3; ++ i）
 v [i] = i + 1; 
 
 int s = 0; 
 for（int i = 0; i <3; ++ i）
 s + = v [i]; 
 
 delete v; 
 return s; 
} 
  
 
 
通过使用Clang 3.7编译，我们得到以下汇编代码：
  main：＃@main 
 push rax 
 mov edi，12 
 call operator new ）
 movabs rcx，8589934593 
 mov qword ptr [rax]，rcx 
 mov dword ptr [rax + 8]，3 
 test rax，rax 
 je .LBB0_2 
 mov rdi，rax 
调用操作符delete（void *）
 .LBB0_2：
 mov eax，6 
 pop rdx 
 ret 
  
这与使用 std :: vector 除了 mov qword ptr [rax]，0 ，它来自std :: vector的构造函数（编译器应该删除它，但是没有这样做，缺陷在其优化算法）。
 
Let's mess around with very basic dynamically allocated memory. We take a vector of 3, set its elements and return the sum of the vector.

In the first test case I used a raw pointer with new[]/delete[]. In the second I used std::vector:
#include <vector>   

int main()
{
  //int *v = new int[3];        // (1)
  auto v = std::vector<int>(3); // (2)


  for (int i = 0; i < 3; ++i)
    v[i] = i + 1;

  int s = 0;
  for (int i = 0; i < 3; ++i)
    s += v[i];

  //delete[] v;                 // (1)
  return s;
}
Assembly of (1) (new[]/delete[])
main:                                   # @main
        mov     eax, 6
        ret
Assembly of (2) (std::vector)
main:                                   # @main
        push    rax
        mov     edi, 12
        call    operator new(unsigned long)
        mov     qword ptr [rax], 0
        movabs  rcx, 8589934593
        mov     qword ptr [rax], rcx
        mov     dword ptr [rax + 8], 3
        test    rax, rax
        je      .LBB0_2
        mov     rdi, rax
        call    operator delete(void*)
.LBB0_2:                                # %std::vector<int, std::allocator<int> >::~vector() [clone .exit]
        mov     eax, 6
        pop     rdx
        ret
Both outputs taken from https://gcc.godbolt.org/ with -std=c++14 -O3

In both versions the returned value is computed at compile time so we see just mov eax, 6; ret.

With the raw new[]/delete[] the dynamic allocation was completely removed. With std::vector however, the memory is allocated, set and freed.

This happens even with an unused variable auto v = std::vector<int>(3): call to new, memory is set and then call to delete.

I realize this is most likely a near impossible answer to give, but maybe someone has some insights and some interesting answers might pop out.

What are the contributing factors that don't allow compiler optimizations to remove the memory allocation in the std::vector case, like in the raw memory allocation case?
 解决方案 
When using a pointer to a dynamically allocated array (directly using new[] and delete[]), the compiler optimized away the calls to operator new and operator delete even though they have observable side effects. This optimization is allowed by the C++ standard section 5.3.4 paragraph 10:

  An implementation is allowed to omit a call to a replaceable global
  allocation function (18.6.1.1, 18.6.1.2). When it does so, the storage
  is instead provided by the implementation or...
I'll show the rest of the sentence, which is crucial, at the end.

This optimization is relatively new because it was first allowed in C++14 (proposal N3664). Clang supported it since 3.4. The latest version of gcc, namely 5.3.0, doesn't take advantage of this relaxation of the as-if rule. It produces the following code:
main:
        sub     rsp, 8
        mov     edi, 12
        call    operator new[](unsigned long)
        mov     DWORD PTR [rax], 1
        mov     DWORD PTR [rax+4], 2
        mov     rdi, rax
        mov     DWORD PTR [rax+8], 3
        call    operator delete[](void*)
        mov     eax, 6
        add     rsp, 8
        ret
MSVC 2013 also doesn't support this optimization. It produces the following code:
main:
  sub         rsp,28h  
  mov         ecx,0Ch  
  call        operator new[] ()  
  mov         rcx,rax  
  mov         dword ptr [rax],1  
  mov         dword ptr [rax+4],2  
  mov         dword ptr [rax+8],3  
  call        operator delete[] ()  
  mov         eax,6  
  add         rsp,28h  
  ret 
I currently don't have access to MSVC 2015 Update 1 and therefore I don't know whether it supports this optimization or not.

Finally, here is the assembly code generated by icc 13.0.1:
main:
        push      rbp                                          
        mov       rbp, rsp                                   
        and       rsp, -128                                    
        sub       rsp, 128                                     
        mov       edi, 3                                       
        call      __intel_new_proc_init                         
        stmxcsr   DWORD PTR [rsp]                               
        mov       edi, 12                                 
        or        DWORD PTR [rsp], 32832                       
        ldmxcsr   DWORD PTR [rsp]                               
        call      operator new[](unsigned long)
        mov       rdi, rax                                      
        mov       DWORD PTR [rax], 1                            
        mov       DWORD PTR [4+rax], 2                          
        mov       DWORD PTR [8+rax], 3                         
        call      operator delete[](void*)
        mov       eax, 6    
        mov       rsp, rbp                           
        pop       rbp                                   
        ret                                          
Clearly, it doesn't support this optimization. I don't have access to the latest version of icc, namely 16.0.

All of these code snippets have been produced with optimizations enabled.

When using std::vector, all of these compilers didn't optimize away the allocation. When a compiler doesn't perform an optimization, it's either because it cannot for some reason or it's just not yet supported.

  What are the contributing factors that don't allow compiler
  optimizations to remove the memory allocation in the std::vector case,
  like in the raw memory allocation case?
The compiler didn't perform the optimization because it's not allowed to. To see this, let's see the rest of the sentence of paragraph 10 from 5.3.4:

  An implementation is allowed to omit a call to a replaceable global
  allocation function (18.6.1.1, 18.6.1.2). When it does so, the storage
  is instead provided by the implementation or provided by extending the
  allocation of another new-expression.
What this is saying is that you can omit a call to a replaceable global allocation function only if it originated from a new-expression. A new-expression is defined in paragraph 1 of the same section.

The following expression
new int[3]
is a new-expression and therefore the compiler is allowed to optimize away the associated allocation function call.

On the other hand, the following expression:
::operator new(12)
is NOT a new-expression (see 5.3.4 paragraph 1). This is just a function call expression. In other words, this is treated as a typical function call. This function cannot be optimized away because its imported from another shared library (even if you linked the runtime statically, the function itself calls another imported function).

The default allocator used by std::vector allocates memory using ::operator new and therefore the compiler is not allowed to optimize it away.

Let's test this. Here's the code:
int main()
{
  int *v =  (int*)::operator new(12);

  for (int i = 0; i < 3; ++i)
    v[i] = i + 1;

  int s = 0;
  for (int i = 0; i < 3; ++i)
    s += v[i];

  delete v;
  return s;
}
By compiling using Clang 3.7, we get the following assembly code:
main:                                   # @main
        push    rax
        mov     edi, 12
        call    operator new(unsigned long)
        movabs  rcx, 8589934593
        mov     qword ptr [rax], rcx
        mov     dword ptr [rax + 8], 3
        test    rax, rax
        je      .LBB0_2
        mov     rdi, rax
        call    operator delete(void*)
.LBB0_2:
        mov     eax, 6
        pop     rdx
        ret
This is exactly the same as assembly code generated when using std::vector except for mov     qword ptr [rax], 0 which comes from the constructor of std::vector (the compiler should have removed it but failed to do so because of a flaw in its optimization algorithms).

                        这篇关于优化raw new [] / delete [] vs std :: vector的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

优化raw new [] / delete [] vs std :: vector [英] Optimization of raw new[]/delete[] vs std::vector

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

优化raw new [] / delete [] vs std :: vector [英] Optimization of raw new[]/delete[] vs std::vector

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭