当传递参数作为编译时间常数或变量函数的性能差异 [英] difference between the function performance when passing parameter as compile time constant or variable
问题描述
在Linux内核code有一个用于测试位(Linux版本的2.6.2)宏:
In Linux kernel code there is a macro used to test bit ( Linux version 2.6.2 ):
#define test_bit(nr, addr) \
(__builtin_constant_p((nr)) \
? constant_test_bit((nr), (addr)) \
: variable_test_bit((nr), (addr)))
其中, constant_test_bit
和 variable_test_bit
定义为:
static inline int constant_test_bit(int nr, const volatile unsigned long *addr )
{
return ((1UL << (nr & 31)) & (addr[nr >> 5])) != 0;
}
static __inline__ int variable_test_bit(int nr, const volatile unsigned long *addr)
{
int oldbit;
__asm__ __volatile__(
"btl %2,%1\n\tsbbl %0,%0"
:"=r" (oldbit)
:"m" (ADDR),"Ir" (nr));
return oldbit;
}
据我所知, __ builtin_constant_p
是用来检测变量是否是编译时间常数或未知。我的问题是:是否有这两个功能之间的性能差异,当参数是一个编译时间常数或没有?为什么要使用C版本的时候,而使用的程序集版本时,它不是?
I understand that __builtin_constant_p
is used to detect whether a variable is compile time constant or unknown. My question is: Is there any performance difference between these two functions when the argument is a compile time constant or not? Why use the C version when it is and use the assembly version when it's not?
更新:下面的主要功能是用于测试的性能:
UPDATE: The following main function is used to test the performance:
不变,通话constant_test_bit:
constant, call constant_test_bit:
int main(void) {
unsigned long i, j = 21;
unsigned long cnt = 0;
srand(111)
//j = rand() % 31;
for (i = 1; i < (1 << 30); i++) {
j = (j + 1) % 28;
if (constant_test_bit(j, &i))
cnt++;
}
if (__builtin_constant_p(j))
printf("j is a compile time constant\n");
return 0;
}
这正确输出句子的 j是一个...
有关的其他情况只是其中注释指定一个随机数Ĵ
行,并相应地改变函数的名称。当该行被注释掉的输出将是空的,而这种预期。
For the other situations just uncomment the line which assigns a "random" number to j
and change the function name accordingly. When that line is uncommented the output will be empty, and this is expected.
我用 GCC test.c的-O1
编译,这里是结果:
I use gcc test.c -O1
to compile, and here is the result:
不变,constant_test_bit:
constant, constant_test_bit:
$ time ./a.out
j is compile time constant
real 0m0.454s
user 0m0.450s
sys 0m0.000s
不变,variable_test_bit(省略时间./a.out
,同为以下):
j is compile time constant
real 0m0.885s
user 0m0.883s
sys 0m0.000s
变量,constant_test_bit:
variable, constant_test_bit:
real 0m0.485s
user 0m0.477s
sys 0m0.007s
变量,variable_test_bit:
variable, variable_test_bit:
real 0m3.471s
user 0m3.467s
sys 0m0.000s
我有每个版本运行几次,上述结果是它们的典型值。看来 constant_test_bit
功能总是快于 variable_test_bit
功能,无论参数是一个编译时间常数或不是......在过去的两个结果(当Ĵ
不是常量)变量版本甚至比恒温的慢得多。
我失去了一些东西在这里?
I have each version runs several times, and the above results are the typical values of them. It seems the constant_test_bit
function is always faster than the variable_test_bit
function, no matter whether the parameter is a compile time constant or not... For the last two results( when j
is not constant ) the variable version is even dramatically slower than the constant one.
Am I missing something here?
推荐答案
使用 godbolt 我们可以做一个< A HREF =http://goo.gl/azELtZ相对=nofollow>使用constant_test_bit 实验,下面的两个测试函数编译 GCC
与 -O3
标记:
Using godbolt we can do a experiment using of constant_test_bit, the following two test functions are compiled gcc
with the -O3
flag:
// Non constant expression test case
int func1(unsigned long i, unsigned long j)
{
int x = constant_test_bit(j, &i) ;
return x ;
}
// constant expression test case
int func2(unsigned long i)
{
int x = constant_test_bit(21, &i) ;
return x ;
}
我们看到优化器能够不断前pression情况下优化为以下内容:
We see the optimizer is able to optimize the constant expression case to the following:
shrq $21, %rax
andl $1, %eax
,而非恒定前pression情况下结束,如下所示:
while the non-constant expression case ends up as follows:
sarl $5, %eax
andl $31, %ecx
cltq
leaq -8(%rsp,%rax,8), %rax
movq (%rax), %rax
shrq %cl, %rax
andl $1, %eax
因此,优化能够产生更好的code为恒恩pression情况下,我们可以看到,对于 constant_test_bit
非恒定的情况下,在 variable_test_bit
是pretty坏相比手卷组装和实施者必须相信 constant_test_bit 恒恩pression情况code>结束是优于:
So the optimizer is able to produce much better code for the constant expression case and we can see that the non-constant case for constant_test_bit
is pretty bad compared to the hand rolled assembly in variable_test_bit
and the implementer must believe the constant expression case for constant_test_bit
ends up being better than:
btl %edi,8(%rsp)
sbbl %esi,%esi
在大多数情况下。
for most cases.
至于为什么你的测试案例似乎显示出不同的结论是,你的测试情况下,它是有缺陷的。我一直没能走出苏斯所有问题。但是,如果我们看一下这种情况下使用 constant_test_bit
与非-constant前pression我们可以看到优化器能够将所有工作的外观外,并减少相关的 constant_test_bit
的工作循环内为:
As to why your test case seems to show a different conclusion is that your test case it is flawed. I have not been able to suss out all the issues. But if we look at this case using constant_test_bit
with a non-constant expression we can see the optimizer is able to move all the work outside the look and reduce the work related to constant_test_bit
inside the loop to:
movq (%rax), %rdi
甚至用旧 GCC
版本,但这种情况可能会不一样,以正在使用 test_bit
的情况下,英寸有可能更特定的情况下这种优化将是不可能的。
even with an older gcc
version, but this case may not be relevant to the cases test_bit
is being used in. There may be more specific cases where this kind of optimization won't be possible.
这篇关于当传递参数作为编译时间常数或变量函数的性能差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!