GCC内在延长除法/乘法 [英] gcc intrinsic for extended division/multiplication

查看:303
本文介绍了GCC内在延长除法/乘法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现代的CPU可以在两个本机尺寸字之间进行扩展的乘法和存储低和高导致独立的寄存器。同样,在执行除法时,它们存储商和在两个不同的寄存器,而不是剩余丢弃无用的部分。

Modern CPU's can perform extended multiplication between two native-size words and store the low and high result in separate registers. Similarly, when performing division, they store the quotient and the remainder in two different registers instead of discarding the unwanted part.

是否有某种便携式GCC内在这将采取以下签名:

Is there some sort of portable gcc intrinsic which would take the following signature:

void extmul(size_t a, size_t b, size_t *lo, size_t *hi);

或类似的东西,并划分:

Or something like that, and for division:

void extdiv(size_t a, size_t b, size_t *q, size_t *r);

我知道我可以用内联汇编做我自己,并通过抛出的#ifdef在code鞋拔子便携进去,我也可以使用部分资金(这将是显著慢)模拟乘法一部分,但我想避免了可读性。肯定有存在着一些内置的功能来做到这一点?

I know I could do it myself with inline assembly and shoehorn portability into it by throwing #ifdef's in the code, or I could emulate the multiplication part using partial sums (which would be significantly slower) but I would like to avoid that for readability. Surely there exists some built-in function to do this?

推荐答案

有关GCC 4.6以来的版本,您可以使用 __ int128 。这适用于大多数64位硬件。例如

For gcc since version 4.6 you can use __int128. This works on most 64 bit hardware. For instance

要获得一个64×64位乘法的128位结果只用

To get the 128 bit result of a 64x64 bit multiplication just use

void extmul(size_t a, size_t b, size_t *lo, size_t *hi) {
    __int128 result = (__int128)a * (__int128)b;
    *lo = (size_t)result;
    *hi = result >> 64;
}

在x86_64的GCC是足够聪明的编译这

On x86_64 gcc is smart enough to compile this to

   0:   48 89 f8                mov    %rdi,%rax
   3:   49 89 d0                mov    %rdx,%r8
   6:   48 f7 e6                mul    %rsi
   9:   49 89 00                mov    %rax,(%r8)
   c:   48 89 11                mov    %rdx,(%rcx)
   f:   c3                      retq   

没有原生128位的支持或类似的要求,以及内嵌只有 MUL 指令后仍然存在。

No native 128 bit support or similar required, and after inlining only the mul instruction remains.

编辑:在32位拱这部作品以类似的方式,则需要更换 __ int128_t uint64_t中和32.移动宽度优化可工作于更古老的GCCS。

On a 32 bit arch this works in a similar way, you need to replace __int128_t by uint64_t and the shift width by 32. The optimization will work on even older gccs.

这篇关于GCC内在延长除法/乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆