为什么64位VC++编译器在函数调用后添加nop指令? [英] Why does 64-bit VC++ compiler add nop instruction after function calls?

查看:38
本文介绍了为什么64位VC++编译器在函数调用后添加nop指令?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用 Visual Studio C++ 2008 SP1,x64 C++ 编译器编译了以下内容:

我很好奇,为什么编译器要在那些 call 之后添加那些 nop 指令?

PS1.我会理解第二个和第三个 nop 将在 4 个字节的边距上对齐代码,但第一个 nop 打破了这个假设.

PS2.编译的 C++ 代码中没有循环或特殊的优化内容:

CTestDlg::CTestDlg(CWnd* pParent/*=NULL*/): CDialog(CTestDlg::IDD, pParent){m_hIcon = AfxGetApp()->LoadIcon(IDR_MAINFRAME);//这没有意义.我用它来设置调试器断点:: GdiFlush();srand(::GetTickCount());}

PS3.附加信息: 首先,感谢大家的投入.

以下是其他观察结果:

  1. 我的第一个猜测是

    1. 我尝试使用更新的链接器构建它,尽管 VS 2013 生成的 x64 代码看起来有些不同,但它仍然添加了那些 nops 在一些 calls 之后:

    1. 此外,dynamicstatic 链接到 MFC 对那些 nop 的存在没有影响.这个是使用 VS 2013 动态链接到 MFC dll 构建的:

    1. 还要注意那些 nop 也可以出现在 nearfar call 之后,并且它们与对齐无关.这是我从 IDA 获得的代码的一部分,如果我再进一步:

    如您所见,nop 插入在 far call 之后,恰好对齐"了下一个 leaB 地址上的 code> 指令!如果这些只是为了对齐而添加的,那就没有意义了.

    1. 我最初倾向于相信,因为 near relative calls(即那些以 E8 开头的) 比 far call

      链接器可能会尝试先使用 near call ,因为它们比 far call<短一个字节/code>s,如果成功,它可能会在末尾用 nops 填充剩余空间.但是上面的例子(5)有点推翻了这个假设.

      所以我仍然没有明确的答案.

      解决方案

      这纯粹是猜测,但它可能是某种 SEH 优化.我说优化是因为在没有 NOP 的情况下,SEH 似乎也能正常工作.NOP 可能有助于加快平仓速度.

      在以下示例(VC2017 现场演示)中,有一个NOP 在调用 basic_string::assign 之后插入 test1 但不在 test2(相同但声明为非抛出1).

      #include #include <字符串>int test1() {std::string s = "a";//NOP 在这里插入s += getchar();返回 (int)s.length();}int test2() throw() {std::string s = "a";s += getchar();返回 (int)s.length();}int main(){返回 test1() + test2();}

      组装:

      test1:...调用 std::basic_string,std::allocator>::赋值键盘 1 ;没有调用 getchar...测试2:...调用 std::basic_string,std::allocator>::赋值调用 getchar

      请注意,默认情况下 MSVS 使用 /EHsc 标志(同步异常处理)进行编译.如果没有那个标志,NOP 就会消失,而使用 /EHa(同步异步异常处理),throw() 不再有什么不同,因为 SEH 始终处于开启状态.

      <小时>

      1 出于某种原因,只有throw() 似乎减少了代码大小,使用noexcept 使生成的代码更大,甚至召唤更多NOP.MSVC...

      I've compiled the following using Visual Studio C++ 2008 SP1, x64 C++ compiler:

      I'm curious, why did compiler add those nop instructions after those calls?

      PS1. I would understand that the 2nd and 3rd nops would be to align the code on a 4 byte margin, but the 1st nop breaks that assumption.

      PS2. The C++ code that was compiled had no loops or special optimization stuff in it:

      CTestDlg::CTestDlg(CWnd* pParent /*=NULL*/)
          : CDialog(CTestDlg::IDD, pParent)
      {
          m_hIcon = AfxGetApp()->LoadIcon(IDR_MAINFRAME);
      
          //This makes no sense. I used it to set a debugger breakpoint
          ::GdiFlush();
          srand(::GetTickCount());
      }
      

      PS3. Additional Info: First off, thank you everyone for your input.

      Here's additional observations:

      1. My first guess was that incremental linking could've had something to do with it. But, the Release build settings in the Visual Studio for the project have incremental linking off.

      2. This seems to affect x64 builds only. The same code built as x86 (or Win32) does not have those nops, even though instructions used are very similar:

      1. I tried to build it with a newer linker, and even though the x64 code produced by VS 2013 looks somewhat different, it still adds those nops after some calls:

      1. Also dynamic vs static linking to MFC made no difference on presence of those nops. This one is built with dynamical linking to MFC dlls with VS 2013:

      1. Also note that those nops can appear after near and far calls as well, and they have nothing to do with alignment. Here's a part of the code that I got from IDA if I step a little bit further on:

      As you see, the nop is inserted after a far call that happens to "align" the next lea instruction on the B address! That makes no sense if those were added for alignment only.

      1. I was originally inclined to believe that since near relative calls (i.e. those that start with E8) are somewhat faster than far calls (or the ones that start with FF,15 in this case)

      the linker may try to go with near calls first, and since those are one byte shorter than far calls, if it succeeds, it may pad the remaining space with nops at the end. But then the example (5) above kinda defeats this hypothesis.

      So I still don't have a clear answer to this.

      解决方案

      This is purely a guess, but it might be some kind of a SEH optimization. I say optimization because SEH seems to work fine without the NOPs too. NOP might help speed up unwinding.

      In the following example (live demo with VC2017), there is a NOP inserted after a call to basic_string::assign in test1 but not in test2 (identical but declared as non-throwing1).

      #include <stdio.h>
      #include <string>
      
      int test1() {
        std::string s = "a";  // NOP insterted here
        s += getchar();
        return (int)s.length();
      }
      
      int test2() throw() {
        std::string s = "a";
        s += getchar();
        return (int)s.length();
      }
      
      int main()
      {
        return test1() + test2();
      }
      

      Assembly:

      test1:
          . . .
          call     std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign
          npad     1         ; nop
          call     getchar
          . . .
      test2:
          . . .
          call     std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign
          call     getchar
      

      Note that MSVS compiles by default with the /EHsc flag (synchronous exception handling). Without that flag the NOPs disappear, and with /EHa (synchronous and asynchronous exception handling), throw() no longer makes a difference because SEH is always on.


      1 For some reason only throw() seems to reduce the code size, using noexcept makes the generated code even bigger and summons even more NOPs. MSVC...

      这篇关于为什么64位VC++编译器在函数调用后添加nop指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆