使gcc使用条件移动 [英] Make gcc use conditional moves
问题描述
是否存在gcc编译指示或可用于强制gcc在特定代码段上生成无分支指令的东西?
Is there a gcc pragma or something I can use to force gcc to generate branch-free instructions on a specific section of code?
我有一段代码,我希望gcc使用cmov指令将其编译为无分支代码:
I have a piece of code that I want gcc to compile to branch-free code using cmov instructions:
int foo(int *a, int n, int x) {
int i = 0, j = n;
while (i < n) {
#ifdef PREFETCH
__builtin_prefetch(a+16*i + 15);
#endif /* PREFETCH */
j = (x <= a[i]) ? i : j;
i = (x <= a[i]) ? 2*i + 1 : 2*i + 2;
}
return j;
}
并且确实如此:
morin@soprano$ gcc -O4 -S -c test.c -o -
.file "test.c"
.text
.p2align 4,,15
.globl foo
.type foo, @function
foo:
.LFB0:
.cfi_startproc
testl %esi, %esi
movl %esi, %eax
jle .L2
xorl %r8d, %r8d
jmp .L3
.p2align 4,,10
.p2align 3
.L6:
movl %ecx, %r8d
.L3:
movslq %r8d, %rcx
movl (%rdi,%rcx,4), %r9d
leal (%r8,%r8), %ecx # put 2*i in ecx
leal 1(%rcx), %r10d # put 2*i+1 in r10d
addl $2, %ecx # put 2*i+2 in ecx
cmpl %edx, %r9d
cmovge %r10d, %ecx # put 2*i+1 in ecx if appropriate
cmovge %r8d, %eax # set j = i if appropriate
cmpl %esi, %ecx
jl .L6
.L2:
rep ret
.cfi_endproc
.LFE0:
.size foo, .-foo
.ident "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2"
.section .note.GNU-stack,"",@progbits
(是的,我意识到循环是一个分支,但是我在谈论循环内的选择运算符.)
(Yes, I realize the loop is a branch, but I'm talking about the choice operators inside the loop.)
不幸的是,当我启用 __ builtin_prefetch
调用时,gcc会生成分支代码:
Unfortunately, when I enable the __builtin_prefetch
call, gcc generates branchy code:
morin@soprano$ gcc -DPREFETCH -O4 -S -c test.c -o -
.file "test.c"
.text
.p2align 4,,15
.globl foo
.type foo, @function
foo:
.LFB0:
.cfi_startproc
testl %esi, %esi
movl %esi, %eax
jle .L7
xorl %ecx, %ecx
jmp .L5
.p2align 4,,10
.p2align 3
.L3:
movl %ecx, %eax # this is the x <= a[i] branch
leal 1(%rcx,%rcx), %ecx
cmpl %esi, %ecx
jge .L11
.L5:
movl %ecx, %r8d # this is the main branch
sall $4, %r8d # setup the prefetch
movslq %r8d, %r8 # setup the prefetch
prefetcht0 60(%rdi,%r8,4) # do the prefetch
movslq %ecx, %r8
cmpl %edx, (%rdi,%r8,4) # compare x with a[i]
jge .L3
leal 2(%rcx,%rcx), %ecx # this is the x > a[i] branch
cmpl %esi, %ecx
jl .L5
.L11:
rep ret
.L7:
.p2align 4,,5
rep ret
.cfi_endproc
.LFE0:
.size foo, .-foo
.ident "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2"
.section .note.GNU-stack,"",@progbits
我已经尝试在此功能上使用 __ attribute __((optimize("if-conversion2")))
,但这无效.
I've tried using __attribute__((optimize("if-conversion2")))
on this function, but that has no effect.
我非常关心的原因是,我手工编辑了由编译器生成的无分支代码(来自第一个示例),以包含prefetcht0指令,并且它的运行速度比gcc生成的两个版本都快得多.
The reason I care so much is that I haved hand-edited compiler-generated branch-free code (from the first example) to include the prefetcht0 instructions and it runs considerably faster than both of the versions gcc produces.
推荐答案
看起来gcc可能难以在循环条件和后置条件中使用的变量上生成无分支代码,并且保持临时寄存器有效跨伪函数内部调用.
It looks like gcc might have troubles to generate branch-free code on variables used in loop conditions and post-conditions, together with the constraints of keeping temporary registers alive across a pseudo-function intrinsic call.
有些可疑之处,使用-funroll-all-loops和-fguess-branch-probability时,函数生成的代码不同.我生成许多退货说明.闻起来像gcc中的一个小错误,编译器的rtl遍历或简化的代码块.
There is something suspicious, the generated code from your function is different when using -funroll-all-loops and -fguess-branch-probability. I generates many return instructions. It smells like a little bug in gcc, around the rtl pass of the compiler, or simplifications of blocks of codes.
下面的代码在两种情况下都是无分支的.这是将错误提交给GCC的好理由.在-O3级别,GCC应该始终生成相同的代码.
The following code is branch-less in both cases. This would be a good reason to submit a bug to GCC. At level -O3, GCC should always generate the same code.
int foo( int *a, int n, int x) {
int c, i = 0, j = n;
while (i < n) {
#ifdef PREFETCH
__builtin_prefetch(a+16*i + 15);
#endif /* PREFETCH */
c = (x > a[i]);
j = c ? j : i;
i = 2*i + 1 + c;
}
return j;
}
生成此
.cfi_startproc
testl %esi, %esi
movl %esi, %eax
jle .L4
xorl %ecx, %ecx
.p2align 4,,10
.p2align 3
.L3:
movslq %ecx, %r8
cmpl %edx, (%rdi,%r8,4)
setl %r8b
cmovge %ecx, %eax
movzbl %r8b, %r8d
leal 1(%r8,%rcx,2), %ecx
cmpl %ecx, %esi
jg .L3
.L4:
rep ret
.cfi_endproc
还有这个
.cfi_startproc
testl %esi, %esi
movl %esi, %eax
jle .L5
xorl %ecx, %ecx
.p2align 4,,10
.p2align 3
.L4:
movl %ecx, %r8d
sall $4, %r8d
movslq %r8d, %r8
prefetcht0 60(%rdi,%r8,4)
movslq %ecx, %r8
cmpl %edx, (%rdi,%r8,4)
setl %r8b
testb %r8b, %r8b
movzbl %r8b, %r9d
cmove %ecx, %eax
leal 1(%r9,%rcx,2), %ecx
cmpl %ecx, %esi
jg .L4
.L5:
rep ret
.cfi_endproc
这篇关于使gcc使用条件移动的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!