当传递参数作为编译时间常数或变量函数的性能差异 [英] difference between the function performance when passing parameter as compile time constant or variable

查看:152
本文介绍了当传递参数作为编译时间常数或变量函数的性能差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Linux内核code有一个用于测试位(Linux版本的2.6.2)宏:

In Linux kernel code there is a macro used to test bit ( Linux version 2.6.2 ):

#define test_bit(nr, addr)                      \
        (__builtin_constant_p((nr))             \
         ? constant_test_bit((nr), (addr))      \
         : variable_test_bit((nr), (addr)))

其中, constant_test_bit variable_test_bit 定义为:

static inline int constant_test_bit(int nr, const volatile unsigned long *addr  )
{       
        return ((1UL << (nr & 31)) & (addr[nr >> 5])) != 0;
}


static __inline__ int variable_test_bit(int nr, const volatile unsigned long *addr)
{       
        int oldbit;

        __asm__ __volatile__(
                "btl %2,%1\n\tsbbl %0,%0"
                :"=r" (oldbit)
                :"m" (ADDR),"Ir" (nr));
        return oldbit;
}

据我所知, __ builtin_constant_p 是用来检测变量是否是编译时间常数或未知。我的问题是:是否有这两个功能之间的性能差异,当参数是一个编译时间常数或没有?为什么要使用C版本的时候,而使用的程序集版本时,它不是?

I understand that __builtin_constant_p is used to detect whether a variable is compile time constant or unknown. My question is: Is there any performance difference between these two functions when the argument is a compile time constant or not? Why use the C version when it is and use the assembly version when it's not?

更新:下面的主要功能是用于测试的性能:

UPDATE: The following main function is used to test the performance:

不变,通话constant_test_bit:

constant, call constant_test_bit:

int main(void) {
        unsigned long i, j = 21;
        unsigned long cnt = 0;
        srand(111)
        //j = rand() % 31;
        for (i = 1; i < (1 << 30); i++) {
                j = (j + 1) % 28;
                if (constant_test_bit(j, &i))
                        cnt++;
        }
        if (__builtin_constant_p(j))
                printf("j is a compile time constant\n");
        return 0;
}

这正确输出句子的 j是一个...

有关的其他情况只是其中注释指定一个随机数Ĵ行,并相应地改变函数的名称。当该行被注释掉的输出将是空的,而这种预期。

For the other situations just uncomment the line which assigns a "random" number to j and change the function name accordingly. When that line is uncommented the output will be empty, and this is expected.

我用 GCC test.c的-O1 编译,这里是结果:

I use gcc test.c -O1 to compile, and here is the result:

不变,constant_test_bit:

constant, constant_test_bit:

$ time ./a.out 

j is compile time constant

real    0m0.454s
user    0m0.450s
sys     0m0.000s

不变,variable_test_bit(省略时间./a.out ,同为以下):

j is compile time constant

real    0m0.885s
user    0m0.883s
sys     0m0.000s

变量,constant_test_bit:

variable, constant_test_bit:

real    0m0.485s
user    0m0.477s
sys     0m0.007s

变量,variable_test_bit:

variable, variable_test_bit:

real    0m3.471s
user    0m3.467s
sys     0m0.000s

我有每个版本运行几次,上述结果是它们的典型值。看来 constant_test_bit 功能总是快于 variable_test_bit 功能,无论参数是一个编译时间常数或不是......在过去的两个结果(当Ĵ不是常量)变量版本甚至比恒温的慢得多。
我失去了一些东西在这里?

I have each version runs several times, and the above results are the typical values of them. It seems the constant_test_bit function is always faster than the variable_test_bit function, no matter whether the parameter is a compile time constant or not... For the last two results( when j is not constant ) the variable version is even dramatically slower than the constant one. Am I missing something here?

推荐答案

使用 godbolt 我们可以做一个< A HREF =htt​​p://goo.gl/azELtZ相对=nofollow>使用constant_test_bit 实验,下面的两个测试函数编译 GCC -O3 标记:

Using godbolt we can do a experiment using of constant_test_bit, the following two test functions are compiled gcc with the -O3 flag:

// Non constant expression test case
int func1(unsigned long i, unsigned long j)
{
  int x = constant_test_bit(j, &i) ;
  return x ;
}

// constant expression test case
int func2(unsigned long i)
{
  int x = constant_test_bit(21, &i) ;
  return x ;
}

我们看到优化器能够不断前pression情况下优化为以下内容:

We see the optimizer is able to optimize the constant expression case to the following:

shrq    $21, %rax
andl    $1, %eax

,而非恒定前pression情况下结束,如下所示:

while the non-constant expression case ends up as follows:

sarl    $5, %eax
andl    $31, %ecx
cltq
leaq    -8(%rsp,%rax,8), %rax
movq    (%rax), %rax
shrq    %cl, %rax
andl    $1, %eax

因此​​,优化能够产生更好的code为恒恩pression情况下,我们可以看到,对于 constant_test_bit 非恒定的情况下,在 variable_test_bit 是pretty坏相比手卷组装和实施者必须相信 constant_test_bit 结束是优于:

So the optimizer is able to produce much better code for the constant expression case and we can see that the non-constant case for constant_test_bit is pretty bad compared to the hand rolled assembly in variable_test_bit and the implementer must believe the constant expression case for constant_test_bit ends up being better than:

btl %edi,8(%rsp)
sbbl %esi,%esi 

在大多数情况下。

for most cases.

至于为什么你的测试案例似乎显示出不同的结论是,你的测试情况下,它是有缺陷的。我一直没能走出苏斯所有问题。但是,如果我们看一下这种情况下使用 constant_test_bit 与非-constant前pression我们可以看到优化器能够将所有工作的外观外,并减少相关的 constant_test_bit 的工作循环内为:

As to why your test case seems to show a different conclusion is that your test case it is flawed. I have not been able to suss out all the issues. But if we look at this case using constant_test_bit with a non-constant expression we can see the optimizer is able to move all the work outside the look and reduce the work related to constant_test_bit inside the loop to:

movq    (%rax), %rdi

甚至用旧 GCC 版本,但这种情况可能会不一样,以正在使用 test_bit 的情况下,英寸有可能更特定的情况下这种优化将是不可能的。

even with an older gcc version, but this case may not be relevant to the cases test_bit is being used in. There may be more specific cases where this kind of optimization won't be possible.

这篇关于当传递参数作为编译时间常数或变量函数的性能差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆