改变端倪,工会比移位更有效吗? [英] Changing endianess, is union more efficient than bitshifts?
问题描述
有人要求我改变一个整数的内在性.我当时的想法是使用移位
I was asked for a challenge to change the endianess of an int. The idea I had was to use bitshifts
int swap_endianess(int color)
{
int a;
int r;
int g;
int b;
a = (color & (255 << 24)) >> 24;
r = (color & (255 << 16)) >> 16;
g = (color & (255 << 8)) >> 8;
b = (color & 255)
return (b << 24 | g << 16 | r << 8 | a);
}
但是有人告诉我,使用包含一个int和四个字符的数组的联合会更容易(如果一个int存储在4个字符上),则填充int然后反转该数组.
But someone told me that it was more easy to use a union containing an int and an array of four chars (if an int is stored on 4 chars), fill the int and then reverse the array.
union u_color
{
int color;
char c[4];
};
int swap_endianess(int color)
{
union u_color ucol;
char tmp;
ucol.color = color;
tmp = ucol.c[0];
ucol.c[0] = ucol.c[3];
ucol.c[3] = tmp;
tmp = ucol.c[1];
ucol.c[1] = ucol.c[2];
ucol.c[2] = tmp;
return (ucol.color);
}
在这两者之间交换字节的更有效方法是什么?有更有效的方法吗?
What is the more efficient way of swapping bytes between those two? Are there more efficient ways of doing this?
在I7上进行测试后,联合方式大约需要24秒(用 time
命令测量),而移位方式在2,000,000,000次迭代中大约需要15秒.如果我使用-O1进行编译,则这两种方法仅需1秒,而使用-O2或-O3只需0.001秒.
After having tested on an I7, the union way takes about 24 seconds (measured with time
command), while the bitshift way takes about 15 seconds on 2,000,000,000 iterations.
The is that if I compile with -O1, both of the methods will take only 1 second, and 0.001 second with -O2 or -O3.
bitshift方法在ASM中使用-02和-03编译为 bswap
,但不是联合方式,gcc似乎可以识别简单模式,但不能识别复杂的联合方式.最后,请阅读@ user3386109的底行.
The bitshift methods compile to bswap
in ASM with -02 and -03, but not the union way, gcc seems to recognize the naive pattern but not the complicated union way to do it. To conclude, read the bottom line of @user3386109.
推荐答案
这是字节交换函数的正确代码
Here is the correct code for a byte swap function
uint32_t changeEndianess( uint32_t value )
{
uint32_t r, g, b, a;
r = (value >> 24) & 0xff;
g = (value >> 16) & 0xff;
b = (value >> 8) & 0xff;
a = value & 0xff;
return (a << 24) | (b << 16) | (g << 8) | r;
}
这是一个测试字节交换功能的函数
Here's a function that tests the byte swap function
void testEndianess( void )
{
uint32_t value = arc4random();
uint32_t result = changeEndianess( value );
printf( "%08x %08x\n", value, result );
}
使用LLVM编译器进行全面优化,结果 testEndianess
函数的汇编代码为
Using the LLVM compiler with full optimization, the resulting assembly code for the testEndianess
function is
0x93d0: calll 0xc82e ; call `arc4random`
0x93d5: movl %eax, %ecx ; copy `value` into register CX
0x93d7: bswapl %ecx ; <--- this is the `changeEndianess` function
0x93d9: movl %ecx, 0x8(%esp) ; put 'result' on the stack
0x93dd: movl %eax, 0x4(%esp) ; put 'value' on the stack
0x93e1: leal 0x6536(%esi), %eax ; compute address of the format string
0x93e7: movl %eax, (%esp) ; put the format string on the stack
0x93ea: calll 0xc864 ; call 'printf'
换句话说,LLVM编译器识别出整个 changeEndianess
函数,并将其实现为单个 bswapl
指令.
In other words, the LLVM compiler recognizes the entire changeEndianess
function and implements it as a single bswapl
instruction.
那些想知道为什么需要调用 arc4random
的人的旁注.给出此代码
Side note for those wondering why the call to arc4random
is necessary. Given this code
void testEndianess( void )
{
uint32_t value = 0x11223344;
uint32_t result = changeEndianess( value );
printf( "%08x %08x\n", value, result );
}
编译器生成该程序集
0x93dc: leal 0x6524(%eax), %eax ; compute address of format string
0x93e2: movl %eax, (%esp) ; put the format string on the stack
0x93e5: movl $0x44332211, 0x8(%esp) ; put 'result' on the stack
0x93ed: movl $0x11223344, 0x4(%esp) ; put 'value' on the stack
0x93f5: calll 0xc868 ; call 'printf'
换句话说,给定硬编码的 value
作为输入,编译器会预先计算 changeEndianess
函数的 result
,并将其直接放入汇编代码,完全绕过该功能.
In other words, given a hardcoded value
as input, the compiler precomputes the result
of the changeEndianess
function, and puts that directly into the assembly code, bypassing the function entirely.
最重要的是.以编写代码的合理方式编写代码,然后让编译器进行优化.这些天的编译器是惊人的.在源代码(例如联合)中使用棘手的优化可能会破坏编译器中内置的优化,实际上会导致代码变慢.
The bottom line. Write your code the way it makes sense to write your code, and let the compiler do the optimizing. Compilers these days are amazing. Using tricky optimizations in source code (e.g. unions) may defeat the optimizations built into the compiler, actually resulting in slower code.
这篇关于改变端倪,工会比移位更有效吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!