有效计算32位整数乘法的高阶位 [英] Efficient computation of the high order bits of a 32 bit integer multiplication

查看:161
本文介绍了有效计算32位整数乘法的高阶位的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

许多CPU具有单个汇编操作码,用于返回32位整数乘法的 high 位.通常,将两个32位整数相乘会产生64位结果,但是如果将其存储在32位整数中,则该结果将被截断为低32位.

Many CPUs have single assembly opcodes for returning the high order bits of a 32 bit integer multiplication. Normally multiplying two 32 bit integers produces a 64 bit result, but this is truncated to the low 32 bits if you store it in a 32 bit integer.

例如,在PowerPC上,

For example, on PowerPC, the mulhw opcode returns the high 32 bits of the 64 bit result of a 32x32 bit multiply in one clock. This is exactly what I'm looking for, but more portably. There's a similar opcode, umulhi(), in NVidia CUDA.

在C/C ++中,是否有一种有效的方法来返回32x32乘法的高阶位? 目前,我通过将其转换为64位来进行计算,例如:

In C/C++, is there an efficient way to return the high order bits of the 32x32 multiply? Currently I compute it by casting to 64 bits, something like:

unsigned int umulhi32(unsigned int x, unsigned int y)
{
  unsigned long long xx=x;
  xx*=y;
  return (unsigned int)(xx>>32);
}

但这比普通的32乘32乘法慢11倍以上,因为即使在乘法运算中我也使用过大的64位数学运算.

but this is over 11 times slower than a regular 32 by 32 multiply because I'm using overkill 64 bit math even for the multiply.

有没有一种更快的方法来计算高阶位?

Is there a faster way to compute the high order bits?

这显然不是不是最好使用BigInteger库解决的方法(这是过大的做法,并且会产生巨大的开销).

This is clearly not best solved with a BigInteger library (which is overkill and will have huge overhead).

SSE似乎具有

SSE seems to have PMULHUW, a 16x16 -> top 16 bit version of this, but not a 32x32 -> top 32 version like I'm looking for.

推荐答案

gcc 4.3.2(具有-O1优化或更高版本),完全按照如下所示将其功能转换为IA32程序集:

gcc 4.3.2, with -O1 optimisation or higher, translated your function exactly as you showed it to IA32 assembly like this:

umulhi32:
        pushl   %ebp
        movl    %esp, %ebp
        movl    12(%ebp), %eax
        mull    8(%ebp)
        movl    %edx, %eax
        popl    %ebp
        ret

仅执行一个32位mull,并将结果的高32位(来自%edx)放入返回值.

Which is just doing a single 32 bit mull and putting the high 32 bits of the result (from %edx) into the return value.

这就是您想要的,对吧?听起来您只需要对编译器进行优化即可;)您有可能可以通过消除中间变量来向正确的方向推动编译器:

That's what you wanted, right? Sounds like you just need to turn up the optimisation on your compiler ;) It's possible you could push the compiler in the right direction by eliminating the intermediate variable:

unsigned int umulhi32(unsigned int x, unsigned int y)
{
  return (unsigned int)(((unsigned long long)x * y)>>32);
}

这篇关于有效计算32位整数乘法的高阶位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆