为<比&LT快; =? [英] Is < faster than <=?

查看:146
本文介绍了为<比&LT快; =?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这里笔者说,我读了一本书,如果(A< 901)如果(A&LT速度更快; = 900 )

I'm reading a book where the author says that if( a < 901 ) is faster than if( a <= 900 ).

不正是在这个简单的例子,但也有循环复杂code轻微的性能变化。我想这已经是与生成的机器code东西的情况下,它甚至真的。

Not exactly as in this simple example, but there are slight performance changes on loop complex code. I suppose this has to do something with generated machine code in case it's even true.

推荐答案

没有,也不会在大多数架构更快。你没有指定,但在x86,所有的积分比较会一般在两个机器指令实现的:

No, it will not be faster on most architectures. You didn't specify, but on x86, all of the integral comparisons will be typically implemented in two machine instructions:


  • A 测试 CMP 指令,该指令集 EFLAGS

  • 而一个 江铜(跳转)指令,根据比较类型(和code布局):

    • JNE - 跳转如果不相等 - > ZF = 0

    • JZ - 跳转如果为零(等于) - > ZF = 1

    • JG - 跳跃如果大于 - > ZF = 0且SF = OF

    • (等)

    • A test or cmp instruction, which sets EFLAGS
    • And a Jcc (jump) instruction, depending on the comparison type (and code layout):
      • jne - Jump if not equal --> ZF = 0
      • jz - Jump if zero (equal) --> ZF = 1
      • jg - Jump if greater --> ZF = 0 and SF = OF
      • (etc...)

      示例 $ GCC -m32 -S -masm =英特尔编译test.c的(编辑为简洁起见)

          if (a < b) {
              // Do something 1
          }
      

      编译为:

          mov     eax, DWORD PTR [esp+24]      ; a
          cmp     eax, DWORD PTR [esp+28]      ; b
          jge     .L2                          ; jump if a is >= b
          ; Do something 1
      .L2:
      

          if (a <= b) {
              // Do something 2
          }
      

      编译为:

          mov     eax, DWORD PTR [esp+24]      ; a
          cmp     eax, DWORD PTR [esp+28]      ; b
          jg      .L5                          ; jump if a is > b
          ; Do something 2
      .L5:
      

      因此​​,两者的唯一区别是 JG 与一个 JGE 指令。双方将采取相同的时间量。

      So the only difference between the two is a jg versus a jge instruction. The two will take the same amount of time.

      我想解决的意见,即没有证据表明,不同的跳转指令需要的时间相同。这一个是有点棘手回答,但这里是我能给:在<一个href=\"http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html\">Intel指令集参考,他们都在一个共同的指令组合在一起,江铜(如果满足条件跳转)。同样的分组在<一一起做href=\"http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html\">Optimization参考手册附录C中的延迟和吞吐量。

      I'd like to address the comment that nothing indicates that the different jump instructions take the same amount of time. This one is a little tricky to answer, but here's what I can give: In the Intel Instruction Set Reference, they are all grouped together under one common instruction, Jcc (Jump if condition is met). The same grouping is made together under the Optimization Reference Manual, in Appendix C. Latency and Throughput.

      <强>延迟 - 所需要的时钟周期数
        执行内核来完成所有形成μops的执行
        的指令。

      Latency — The number of clock cycles that are required for the execution core to complete the execution of all of the μops that form an instruction.

      吞吐量 - 需要的时钟周期数
        等待面前的问题端口免费接受相同的指令
        再次。对于许多指令,指令的吞吐量可以是
        比其延迟显著少

      Throughput — The number of clock cycles required to wait before the issue ports are free to accept the same instruction again. For many instructions, the throughput of an instruction can be significantly less than its latency

      江铜的值是:

            Latency   Throughput
      Jcc     N/A        0.5
      

      江铜脚注如下:

      7)的条件跳转指令的选择应基于部分第3.4.1节,科prediction优化的建议,以提高分支机构的predictability。当分支是成功pdicted $ P $ 江铜实际上为零。潜伏期

      7) Selection of conditional jump instructions should be based on the recommendation of section Section 3.4.1, "Branch Prediction Optimization," to improve the predictability of branches. When branches are predicted successfully, the latency of jcc is effectively zero.

      因此​​,没有在英特尔有史以来文档从别人赐予一个江铜指令任何不同。

      So, nothing in the Intel docs ever treats one Jcc instruction any differently from the others.

      如果一个人认为有关用于执行指令的实际电路,可以假设会有简单的和不同的比特/或门在 EFLAGS ,以确定的条件是否得到满足。有那么,没有理由认为测试两个位的指令应该采取任何更多或更少的时间超过一个测试只有一个(忽略门传播延迟,这是比时钟周期少得多。)

      If one thinks about the actual circuitry used to implement the instructions, one can assume that there would be simple AND/OR gates on the different bits in EFLAGS, to determine whether the conditions are met. There is then, no reason that an instruction testing two bits should take any more or less time than one testing only one (Ignoring gate propagation delay, which is much less than the clock period.)

      编辑:浮点

      这为的x87浮点持有也是如此:(pretty很多相同的code如上,但双击而不是 INT

      This holds true for x87 floating point as well: (Pretty much same code as above, but with double instead of int.)

              fld     QWORD PTR [esp+32]
              fld     QWORD PTR [esp+40]
              fucomip st, st(1)              ; Compare ST(0) and ST(1), and set CF, PF, ZF in EFLAGS
              fstp    st(0)
              seta    al                     ; Set al if above (CF=0 and ZF=0).
              test    al, al
              je      .L2
              ; Do something 1
      .L2:
      
              fld     QWORD PTR [esp+32]
              fld     QWORD PTR [esp+40]
              fucomip st, st(1)              ; (same thing as above)
              fstp    st(0)
              setae   al                     ; Set al if above or equal (CF=0).
              test    al, al
              je      .L5
              ; Do something 2
      .L5:
              leave
              ret
      

      这篇关于为&lt;比&LT快; =?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆