如何理解影响分支预测的宏“可能"? [英] How to understand macro `likely` affecting branch prediction?

查看:53
本文介绍了如何理解影响分支预测的宏“可能"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到如果我们知道控制流有可能是真还是假,我们可以告诉编译器,例如,在Linux内核中,有很多likely unlikely,实际上是由 gcc 提供的 __builtin_expect 实现的,所以我想知道它是如何工作的,然后检查了那里的程序集:

I noticed if we know there is good chance for control flow is true or false, we can tell it to compiler, for instance, in Linux kernel, there are lots of likely unlikely, actually impled by __builtin_expect provided by gcc, so I want to find out how does it work, then checked the assembly out there:

  20:branch_prediction_victim.cpp ****             if (array_aka[j] >= 128)
 184                            .loc 3 20 0 is_stmt 1
 185 00f1 488B85D0              movq    -131120(%rbp), %rax
 185      FFFDFF
 186 00f8 8B8485F0              movl    -131088(%rbp,%rax,4), %eax
 186      FFFDFF
 187 00ff 83F87F                cmpl    $127, %eax
 188 0102 7E17                  jle     .L13

然后对于 __builtin_expect

  20:branch_prediction_victim.cpp ****             if (__builtin_expect((array_aka[j] >= 128), 1))
 184                            .loc 3 20 0 is_stmt 1
 185 00f1 488B85D0              movq    -131120(%rbp), %rax
 185      FFFDFF
 186 00f8 8B8485F0              movl    -131088(%rbp,%rax,4), %eax
 186      FFFDFF
 187 00ff 83F87F                cmpl    $127, %eax
 188 0102 0F9FC0                setg    %al
 189 0105 0FB6C0                movzbl  %al, %eax
 190 0108 4885C0                testq   %rax, %rax
 191 010b 7417                  je      .L13

  • 188 - setg 设置如果大于,这里设置如果大于什么?
  • 189 - movzbl 将零扩展字节移动到 long,我知道这一点将 %al 移动到 %eax
  • 190 - testq 按位 OR 然后设置 ZF CF 标志,对吗?
    • 188 - setg set if greater, here set if greater than what?
    • 189 - movzbl move zero extend byte to long, I know this one move %al to %eax
    • 190 - testq bitwise OR then set ZF CF flags, is this right?
    • 我想知道它们如何影响分支预测、提高性能、三个额外指令、需要更多周期对吗?

      I want to know how do they affect branch prediction, and improve performance, three extra instruction, more cycles needed right?

      推荐答案

      setcc 读取 FLAGS,在本例中由 cmp 设置.阅读手册.

      setcc reads FLAGS, in this case set by the cmp right before. Read the manual.

      这看起来你忘记启用优化,所以 __builtin_expect 只是在寄存器中创建一个 0/1 布尔值并分支它是非零的,而不是在原始 FLAGS 条件上分支.不要看未优化的代码,它总是很糟糕.

      This looks like you forgot to enable optimization, so __builtin_expect is just creating a 0 / 1 boolean value in a register and branching on it being non-zero, instead of branching on the original FLAGS condition. Don't look at un-optimized code, it's always going to suck.

      线索是:作为likely一部分的braindead布尔化,并使用RBP作为帧指针从堆栈加载jmovq -131120(%rbp), %rax

      The clues are: the braindead booleanizing as part of likely, and loading j from the stack using RBP as a frame pointer with movq -131120(%rbp), %rax

      可能通常不会改进运行时分支预测,它改进了代码布局以在事情进展顺利时最小化分支的数量源代码说他们会(即快速案例).因此,它改进了常见情况下的 I-cache 局部性.例如编译器会把事情安排好,所以常见的情况是一个未采用的条件分支,只是失败了.这使超标量流水线 CPU 中的前端变得更容易,这些 CPU 可以同时获取/解码多条指令.继续直线获取是最简单的.

      likely generally doesn't improve runtime branch prediction, it improves code layout to minimize the amount of taken branches when things go the way the source code said they would (i.e. the fast case). So it improves I-cache locality for the common case. e.g. the compiler will lay things out so the common case is a not-taken conditional branch, just falling through. This makes things easier for the front-end in superscalar pipelined CPUs that fetch/decode multiple instructions at once. Continuing to fetch in a straight line is easiest.

      likely 实际上可以让编译器在您知道可预测的情况下使用分支而不是 cmov,即使编译器启发式(没有配置文件引导的优化)会弄错的.相关:gcc 优化标志 -O3 使代码比 -O2 慢

      likely can actually get the compiler to use a branch instead of a cmov for cases that you know are predictable, even if compiler heuristics (without profile-guided optimization) would have gotten it wrong. Related: gcc optimization flag -O3 makes code slower than -O2

      这篇关于如何理解影响分支预测的宏“可能"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆