改变端倪,工会比移位更有效吗? [英] Changing endianess, is union more efficient than bitshifts?

查看:50
本文介绍了改变端倪,工会比移位更有效吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人要求我改变一个整数的内在性.我当时的想法是使用移位

I was asked for a challenge to change the endianess of an int. The idea I had was to use bitshifts

int    swap_endianess(int color)
{
    int a;
    int r;
    int g;
    int b;

    a = (color & (255 << 24)) >> 24;
    r = (color & (255 << 16)) >> 16;
    g = (color & (255 << 8)) >> 8;
    b = (color & 255)
    return (b << 24 | g << 16 | r << 8 | a);
}

但是有人告诉我,使用包含一个int和四个字符的数组的联合会更容易(如果一个int存储在4个字符上),则填充int然后反转该数组.

But someone told me that it was more easy to use a union containing an int and an array of four chars (if an int is stored on 4 chars), fill the int and then reverse the array.

union   u_color
{
  int   color;
  char  c[4];
};

int             swap_endianess(int color)
{
  union u_color ucol;
  char          tmp;

  ucol.color = color;
  tmp = ucol.c[0];
  ucol.c[0] = ucol.c[3];
  ucol.c[3] = tmp;
  tmp = ucol.c[1];
  ucol.c[1] = ucol.c[2];
  ucol.c[2] = tmp;
  return (ucol.color);
}

在这两者之间交换字节的更有效方法是什么?有更有效的方法吗?

What is the more efficient way of swapping bytes between those two? Are there more efficient ways of doing this?

在I7上进行测试后,联合方式大约需要24秒(用 time 命令测量),而移位方式在2,000,000,000次迭代中大约需要15秒.如果我使用-O1进行编译,则这两种方法仅需1秒,而使用-O2或-O3只需0.001秒.

After having tested on an I7, the union way takes about 24 seconds (measured with time command), while the bitshift way takes about 15 seconds on 2,000,000,000 iterations. The is that if I compile with -O1, both of the methods will take only 1 second, and 0.001 second with -O2 or -O3.

bitshift方法在​​ASM中使用-02和-03编译为 bswap ,但不是联合方式,gcc似乎可以识别简单模式,但不能识别复杂的联合方式.最后,请阅读@ user3386109的底行.

The bitshift methods compile to bswap in ASM with -02 and -03, but not the union way, gcc seems to recognize the naive pattern but not the complicated union way to do it. To conclude, read the bottom line of @user3386109.

推荐答案

这是字节交换函数的正确代码

Here is the correct code for a byte swap function

uint32_t changeEndianess( uint32_t value )
{
    uint32_t r, g, b, a;

    r = (value >> 24) & 0xff;
    g = (value >> 16) & 0xff;
    b = (value >>  8) & 0xff;
    a =  value        & 0xff;

    return (a << 24) | (b << 16) | (g << 8) | r;
}

这是一个测试字节交换功能的函数

Here's a function that tests the byte swap function

void testEndianess( void )
{
    uint32_t value = arc4random();
    uint32_t result = changeEndianess( value );
    printf( "%08x %08x\n", value, result );
}

使用LLVM编译器进行全面优化,结果 testEndianess 函数的汇编代码为

Using the LLVM compiler with full optimization, the resulting assembly code for the testEndianess function is

0x93d0:  calll  0xc82e                    ; call `arc4random`
0x93d5:  movl   %eax, %ecx                ; copy `value` into register CX
0x93d7:  bswapl %ecx                 ; <--- this is the `changeEndianess` function
0x93d9:  movl   %ecx, 0x8(%esp)           ; put 'result' on the stack
0x93dd:  movl   %eax, 0x4(%esp)           ; put 'value' on the stack
0x93e1:  leal   0x6536(%esi), %eax        ; compute address of the format string
0x93e7:  movl   %eax, (%esp)              ; put the format string on the stack
0x93ea:  calll  0xc864                    ; call 'printf'

换句话说,LLVM编译器识别出整个 changeEndianess 函数,并将其实现为单个 bswapl 指令.

In other words, the LLVM compiler recognizes the entire changeEndianess function and implements it as a single bswapl instruction.

那些想知道为什么需要调用 arc4random 的人的旁注.给出此代码

Side note for those wondering why the call to arc4random is necessary. Given this code

void testEndianess( void )
{
    uint32_t value = 0x11223344;
    uint32_t result = changeEndianess( value );
    printf( "%08x %08x\n", value, result );
}

编译器生成该程序集

0x93dc:  leal   0x6524(%eax), %eax        ; compute address of format string 
0x93e2:  movl   %eax, (%esp)              ; put the format string on the stack
0x93e5:  movl   $0x44332211, 0x8(%esp)    ; put 'result' on the stack
0x93ed:  movl   $0x11223344, 0x4(%esp)    ; put 'value' on the stack
0x93f5:  calll  0xc868                    ; call 'printf'

换句话说,给定硬编码的 value 作为输入,编译器会预先计算 changeEndianess 函数的 result ,并将其直接放入汇编代码,完全绕过该功能.

In other words, given a hardcoded value as input, the compiler precomputes the result of the changeEndianess function, and puts that directly into the assembly code, bypassing the function entirely.

最重要的是.以编写代码的合理方式编写代码,然后让编译器进行优化.这些天的编译器是惊人的.在源代码(例如联合)中使用棘手的优化可能会破坏编译器中内置的优化,实际上会导致代码变慢.

The bottom line. Write your code the way it makes sense to write your code, and let the compiler do the optimizing. Compilers these days are amazing. Using tricky optimizations in source code (e.g. unions) may defeat the optimizations built into the compiler, actually resulting in slower code.

这篇关于改变端倪,工会比移位更有效吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆