可能(x)和__builtin_expect((x),1) [英] likely(x) and __builtin_expect((x),1)

查看:15
本文介绍了可能(x)和__builtin_expect((x),1)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道内核大量使用 likelyunlikely 宏.宏的文档位于 内置函数:long __builtin_expect (long exp,长 c).但他们并没有真正讨论细节.

I know the kernel uses the likely and unlikely macros prodigiously. The docs for the macros are located at Built-in Function: long __builtin_expect (long exp, long c). But they don't really discuss the details.

编译器究竟如何处理likely(x)__builtin_expect((x),1)?

How exactly does a compiler handle likely(x) and __builtin_expect((x),1)?

是由代码生成器还是优化器处理?

Is it handled by the code generator or the optimizer?

是否取决于优化级别?

生成的代码示例是什么?

What's an example of the code generated?

推荐答案

我刚刚在gcc上测试了一个简单的例子.

I just tested a simple example on gcc.

对于 x86,这似乎由优化器处理并取决于优化级别.虽然我猜这里的正确答案是它取决于编译器".

For x86 this seems to be handled by the optimizer and depend on optimization levels. Although I guess a correct answer here would be "it depends on the compiler".

生成的代码取决于 CPU.一些 cpus(sparc64 立即出现在我的脑海中,但我确信还有其他的)在条件分支指令上有标志,告诉 CPU 如何预测它,因此编译器生成预测真/预测假"指令取决于内置编译器中的规则和代码中的提示(如 __builtin_expect).

The code generated is CPU dependent. Some cpus (sparc64 comes immediately to my mind, but I'm sure there are others) have flags on conditional branch instructions that tell the CPU how to predict it, so the compiler generates "predict true/predict false" instructions depending on the built in rules in the compiler and hints from the code (like __builtin_expect).

英特尔在此处记录他们的行为:https://software.intel.com/en-us/articles/branch-and-loop-reorganization-to-prevent-mispredicts.简而言之,英特尔 CPU 上的行为是,如果 CPU 没有关于分支的先前信息,它将预测前向分支不太可能被采用,而后向分支很可能被采用(考虑循环与错误处理).

Intel documents their behavior here: https://software.intel.com/en-us/articles/branch-and-loop-reorganization-to-prevent-mispredicts . In short the behavior on Intel CPUs is that if the CPU has no previous information about a branch it will predict forward branches as unlikely to be taken, while backwards branches are likely to be taken (think about loops vs. error handling).

这是一些示例代码:

int bar(int);
int
foo(int x)
{
    if (__builtin_expect(x>10, PREDICTION))
        return bar(10);
    return 42;
}

编译时使用(我使用 omit-frame-pointer 使输出更具可读性,但我仍然在下面清理它):

Compiled with (I'm using omit-frame-pointer to make the output more readable, but I still cleaned it up below):

$ cc -S -fomit-frame-pointer -O0 -DPREDICTION=0 -o 00.s foo.c
$ cc -S -fomit-frame-pointer -O0 -DPREDICTION=1 -o 01.s foo.c
$ cc -S -fomit-frame-pointer -O2 -DPREDICTION=0 -o 20.s foo.c
$ cc -S -fomit-frame-pointer -O2 -DPREDICTION=1 -o 21.s foo.c

00.s 和 01.s 之间没有区别,因此这表明这取决于优化(至少对于 gcc).

There's no difference between 00.s and 01.s, so that shows that this is dependent on optimization (for gcc at least).

这是 20.s 的(清理后的)生成代码:

Here's the (cleaned up) generated code for 20.s:

foo:
    cmpl    $10, %edi
    jg  .L2
    movl    $42, %eax
    ret
.L2:
    movl    $10, %edi
    jmp bar

这里是 21.s:

foo:
    cmpl    $10, %edi
    jle .L6
    movl    $10, %edi
    jmp bar
.L6:
    movl    $42, %eax
    ret

正如预期的那样,编译器重新排列了代码,以便我们不希望采用的分支在前向分支中完成.

As expected the compiler rearranged the code so that the branch we don't expect to take is done in a forward branch.

这篇关于可能(x)和__builtin_expect((x),1)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆