for循环中的pIter!= cont.end()的性能 [英] Performance of pIter != cont.end() in for loop

查看:157
本文介绍了for循环中的pIter!= cont.end()的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近通过Herb Sutter的Exceptional C ++,我对他在第6项 - 临时对象中给出的特定建议有严重怀疑。

I was getting through "Exceptional C++" by Herb Sutter lately, and I have serious doubts about a particular recommendation he gives in Item 6 - Temporary Objects.

他提供在以下代码中查找不必要的临时对象:

He offers to find unnecessary temporary objects in the following code:

string FindAddr(list<Employee> emps, string name) 
{
  for (list<Employee>::iterator i = emps.begin(); i != emps.end(); i++)
  {
    if( *i == name )
    {
      return i->addr;
    }
  }
  return "";
}

作为示例之一,他建议预先计算 emps.end(),因为每次迭代都会创建一个临时对象:

As one of the example, he recommends to precompute the value of emps.end() before the loop, since there is a temporary object created on every iteration:


对于大多数容器(包括列表),调用end()返回一个必须被构造和销毁的
临时对象。因为
的值不会改变,重新计算(和重建和
重新估计)它在每次循环迭代都是不必要的
低效和不美观。该值应该只计算一次,
存储在本地对象中并重用。

For most containers (including list), calling end() returns a temporary object that must be constructed and destroyed. Because the value will not change, recomputing (and reconstructing and redestroying) it on every loop iteration is both needlessly inefficient and unaesthetic. The value should be computed only once, stored in a local object, and reused.

list<Employee>::const_iterator end(emps.end());
for (list<Employee>::const_iterator i = emps.begin(); i != end; ++i)

对我来说,这是不必要的并发症。即使用紧凑 auto 替换丑陋的类型声明,他仍然得到两行代码,而不是一行。更多的是,他在外部范围内有 end 变量。

For me, this is unnecessary complication. Even if one replaces ugly type declarations with compact auto, he still gets two lines of code instead of one. Even more, he has this end variable in the outer scope.

我确信现代编译器会优化这个片段的代码,因为我实际上使用 const_iterator 这里,很容易检查循环内容是否访问容器不知何故。

I was sure modern compilers will optimize this piece of code anyway, because I'm actually using const_iterator here and it is easy to check whether the loop content is accessing the container somehow. Compilers got smarter within the last 13 years, right?

无论如何,我更喜欢第一个版本 i!= emps.end()在大多数情况下,我不是那么担心性能。但是我想知道这是否可以依靠编译器来优化呢?

Anyway, I will prefer the first version with i != emps.end() in most cases, where I'm not so much worried about performance. But I want to know for sure, whether this is a kind of construction I could rely on a compiler to optimize?

更新

感谢您对如何使这个无用的代码更好的建议。请注意,我的问题是关于编译器,而不是编程技术。目前唯一的相关答案来自 NPE Ellioh

Thanks for your suggestions on how to make this useless code better. Please note, my question is about compiler, not programming techniques. The only relevant answers for now are from NPE and Ellioh.

推荐答案

UPD:已经在1999年出版,除非我错了。这是14年前,在现代编程14年是很多时间。许多在1999年良好可靠的建议,现在可能完全过时。虽然我的回答是关于单个编译器和单个平台,但也有一个更一般的想法。

UPD: The book you are speaking about has been published in 1999, unless I'm mistaking. That's 14 years ago, and in modern programming 14 years is a lot of time. Many recommendations that were good and reliable in 1999, may be completely obsolete by now. Though my answer is about a single compiler and a single platform, there is also a more general idea.

关心额外的变量,重用一个简单的方法和类似的返回值老C ++的技巧是回到20世纪90年代的C ++。类似 end()的简单方法应该内联得很好,内联的结果应该被优化为被调用的代码的一部分。 99%情况不需要手动操作,例如创建结束变量。只有在以下情况下才能执行此类操作:

Caring about extra variables, reusing a return value of trivial methods and similar tricks of old C++ is a step back towards the C++ of 1990s. Trivial methods like end() should be inlined quite well, and the result of inlining should be optimized as a part of the code it is called from. 99% situations do not require manual actions such as creating an end variable at all. Such things should be done only if:


  1. 您知道在某些编译器/平台上应该运行的代码未经优化

  2. 它已成为您计划的瓶颈(避免过早优化)。

我看过64位g ++生成的是什么:

I've looked at what is generated by 64-bit g++:

gcc version 4.6.3 20120918 (prerelease) (Ubuntu/Linaro 4.6.3-10ubuntu1)

最初我认为用优化它应该是确定和应该在两个版本之间没有区别。但看起来很奇怪:您认为非最佳的版本其实更好。我认为,道德是:没有理由尝试比编译器更聪明。让我们看看这两个版本。

Initially I thought that with optimizations on it should be ok and there should be no difference between two versions. But looks like things are strange: the version you considered non-optimal is actually better. I think, the moral is: there is no reason to try being smarter than a compiler. Let's see both versions.

#include <list>

using namespace std;

int main() {
  list<char> l;
  l.push_back('a');

  for(list<char>::iterator i=l.begin(); i != l.end(); i++)
      ;

  return 0;
}

int main1() {
  list<char> l;
  l.push_back('a');
  list<char>::iterator e=l.end();
  for(list<char>::iterator i=l.begin(); i != e; i++)
      ;

  return 0;
}

然后我们应该使用优化c $ c> g ++ ,你可以尝试你的编译器)并反汇编 main main1

Then we should compile this with optimizations on (I use 64-bit g++, you may try your compiler) and disassemble main and main1:

(gdb) disas main
Dump of assembler code for function main():
   0x0000000000400650 <+0>: push   %rbx
   0x0000000000400651 <+1>: mov    $0x18,%edi
   0x0000000000400656 <+6>: sub    $0x20,%rsp
   0x000000000040065a <+10>:    lea    0x10(%rsp),%rbx
   0x000000000040065f <+15>:    mov    %rbx,0x10(%rsp)
   0x0000000000400664 <+20>:    mov    %rbx,0x18(%rsp)
   0x0000000000400669 <+25>:    callq  0x400630 <_Znwm@plt>
   0x000000000040066e <+30>:    cmp    $0xfffffffffffffff0,%rax
   0x0000000000400672 <+34>:    je     0x400678 <main()+40>
   0x0000000000400674 <+36>:    movb   $0x61,0x10(%rax)
   0x0000000000400678 <+40>:    mov    %rax,%rdi
   0x000000000040067b <+43>:    mov    %rbx,%rsi
   0x000000000040067e <+46>:    callq  0x400610 <_ZNSt8__detail15_List_node_base7_M_hookEPS0_@plt>
   0x0000000000400683 <+51>:    mov    0x10(%rsp),%rax
   0x0000000000400688 <+56>:    cmp    %rbx,%rax
   0x000000000040068b <+59>:    je     0x400698 <main()+72>
   0x000000000040068d <+61>:    nopl   (%rax)
   0x0000000000400690 <+64>:    mov    (%rax),%rax
   0x0000000000400693 <+67>:    cmp    %rbx,%rax
   0x0000000000400696 <+70>:    jne    0x400690 <main()+64>
   0x0000000000400698 <+72>:    mov    %rbx,%rdi
   0x000000000040069b <+75>:    callq  0x400840 <std::list<char, std::allocator<char> >::~list()>
   0x00000000004006a0 <+80>:    add    $0x20,%rsp
   0x00000000004006a4 <+84>:    xor    %eax,%eax
   0x00000000004006a6 <+86>:    pop    %rbx
   0x00000000004006a7 <+87>:    retq   

查看位于0x0000000000400683-0x000000000040068b的命令。这是循环体,它似乎是完全优化的:

Look at the commands located at 0x0000000000400683-0x000000000040068b. That's the loop body and it seems to be perfectly optimized:

   0x0000000000400690 <+64>:    mov    (%rax),%rax
   0x0000000000400693 <+67>:    cmp    %rbx,%rax
   0x0000000000400696 <+70>:    jne    0x400690 <main()+64>

main1

(gdb) disas main1
Dump of assembler code for function main1():
   0x00000000004007b0 <+0>: push   %rbp
   0x00000000004007b1 <+1>: mov    $0x18,%edi
   0x00000000004007b6 <+6>: push   %rbx
   0x00000000004007b7 <+7>: sub    $0x18,%rsp
   0x00000000004007bb <+11>:    mov    %rsp,%rbx
   0x00000000004007be <+14>:    mov    %rsp,(%rsp)
   0x00000000004007c2 <+18>:    mov    %rsp,0x8(%rsp)
   0x00000000004007c7 <+23>:    callq  0x400630 <_Znwm@plt>
   0x00000000004007cc <+28>:    cmp    $0xfffffffffffffff0,%rax
   0x00000000004007d0 <+32>:    je     0x4007d6 <main1()+38>
   0x00000000004007d2 <+34>:    movb   $0x61,0x10(%rax)
   0x00000000004007d6 <+38>:    mov    %rax,%rdi
   0x00000000004007d9 <+41>:    mov    %rsp,%rsi
   0x00000000004007dc <+44>:    callq  0x400610 <_ZNSt8__detail15_List_node_base7_M_hookEPS0_@plt>
   0x00000000004007e1 <+49>:    mov    (%rsp),%rdi
   0x00000000004007e5 <+53>:    cmp    %rbx,%rdi
   0x00000000004007e8 <+56>:    je     0x400818 <main1()+104>
   0x00000000004007ea <+58>:    mov    %rdi,%rax
   0x00000000004007ed <+61>:    nopl   (%rax)
   0x00000000004007f0 <+64>:    mov    (%rax),%rax
   0x00000000004007f3 <+67>:    cmp    %rbx,%rax
   0x00000000004007f6 <+70>:    jne    0x4007f0 <main1()+64>
   0x00000000004007f8 <+72>:    mov    (%rdi),%rbp
   0x00000000004007fb <+75>:    callq  0x4005f0 <_ZdlPv@plt>
   0x0000000000400800 <+80>:    cmp    %rbx,%rbp
   0x0000000000400803 <+83>:    je     0x400818 <main1()+104>
   0x0000000000400805 <+85>:    nopl   (%rax)
   0x0000000000400808 <+88>:    mov    %rbp,%rdi
   0x000000000040080b <+91>:    mov    (%rdi),%rbp
   0x000000000040080e <+94>:    callq  0x4005f0 <_ZdlPv@plt>
   0x0000000000400813 <+99>:    cmp    %rbx,%rbp
   0x0000000000400816 <+102>:   jne    0x400808 <main1()+88>
   0x0000000000400818 <+104>:   add    $0x18,%rsp
   0x000000000040081c <+108>:   xor    %eax,%eax
   0x000000000040081e <+110>:   pop    %rbx
   0x000000000040081f <+111>:   pop    %rbp
   0x0000000000400820 <+112>:   retq   

循环的代码类似,它是:

The code for the loop is similar, it is:

   0x00000000004007f0 <+64>:    mov    (%rax),%rax
   0x00000000004007f3 <+67>:    cmp    %rbx,%rax
   0x00000000004007f6 <+70>:    jne    0x4007f0 <main1()+64>

但是循环周围有很多额外的东西。显然,额外的代码已经使事情很好。

But there is alot of extra stuff around the loop. Apparently, extra code has made the things WORSE.

这篇关于for循环中的pIter!= cont.end()的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆