如何生成sse4.2 popcnt机器指令 [英] How to generate a sse4.2 popcnt machine instruction
问题描述
使用c程序:
int main(int argc , char** argv)
{
return __builtin_popcountll(0xf0f0f0f0f0f0f0f0);
}
和编译器行(gcc 4.4-Intel Xeon L3426):
and the compiler line (gcc 4.4 - Intel Xeon L3426):
gcc -msse4.2 poptest.c -o poptest
我没有得到内置的popcnt指令,而是编译器生成了一个查找表并以此方式计算了popcount.生成的二进制文件超过8000个字节. (Y!)
I do NOT get the builtin popcnt insruction rather the compiler generates a lookup table and computes the popcount that way. The resulting binary is over 8000 bytes. (Yuk!)
非常感谢您的帮助.
推荐答案
您必须告诉GCC为支持以下架构的代码生成代码 popcnt指令:
You have to tell GCC to generate code for an architecture that supports the popcnt instruction:
gcc -march=corei7 popcnt.c
或者仅启用对popcnt的支持:
Or just enable support for popcnt:
gcc -mpopcnt popcnt.c
在您的示例程序中,__builtin_popcountll
的参数是一个
常量,因此编译器可能会在编译时进行计算
时间,并且永远不会发出popcnt指令.即使没有,GCC也会这样做
要求优化程序.
In your example program the parameter to __builtin_popcountll
is a
constant so the compiler will probably do the calculation at compile
time and never emit the popcnt instruction. GCC does this even if not
asked to optimize the program.
因此,请尝试在编译时传递一些它不知道的内容:
So try passing it something that it can't know at compile time:
int main (int argc, char** argv)
{
return __builtin_popcountll ((long long) argv);
}
$ gcc -march=corei7 -O popcnt.c && objdump -d a.out | grep '<main>' -A 2
0000000000400454 <main>:
400454: f3 48 0f b8 c6 popcnt %rsi,%rax
400459: c3 retq
这篇关于如何生成sse4.2 popcnt机器指令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!