乘法的高位比特的有效计算 [英] Efficient computation of the high order bits of a multiplication

查看:200
本文介绍了乘法的高位比特的有效计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有多个CPU,单集运codeS为返回的为了一个32位整数乘法位。一般情况下相乘两个32位整数产生64位的结果,但如果你将其存储在一个32位整数,这是截断为低32位。

Many CPUs have single assembly opcodes for returning the high order bits of a 32 bit integer multiplication. Normally multiplying two 32 bit integers produces a 64 bit result, but this is truncated to the low 32 bits if you store it in a 32 bit integer.

例如,在PowerPC上,<一个href=\"http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/com.ibm.aix.aixassem/doc/alangref/mulhw.htm\">mulhw运code返回乘以一个时钟一个32×32位的64位结果的高32位。这正是我要找的,但更多的可移植性。有一个类似的运算code,umulhi(),在NVIDIA CUDA。

For example, on PowerPC, the mulhw opcode returns the high 32 bits of the 64 bit result of a 32x32 bit multiply in one clock. This is exactly what I'm looking for, but more portably. There's a similar opcode, umulhi(), in NVidia CUDA.

在C / C ++,有没有返回32x32乘法的高位一种有效的方式?
目前,我通过强制转换为64位计算的,是这样的:

In C/C++, is there an efficient way to return the high order bits of the 32x32 multiply? Currently I compute it by casting to 64 bits, something like:

unsigned int umulhi32(unsigned int x, unsigned int y)
{
  unsigned long long xx=x;
  xx*=y;
  return (unsigned int)(xx>>32);
}

但这是32乘以比普通32慢了11倍,因为我使用的是大材小用64位数学甚至是乘法。

but this is over 11 times slower than a regular 32 by 32 multiply because I'm using overkill 64 bit math even for the multiply.

有没有计算高位更快的方法?

Is there a faster way to compute the high order bits?

这是显然的不可以与一个BigInteger库(这是矫枉过正,将有巨大的开销)最好的解决。

This is clearly not best solved with a BigInteger library (which is overkill and will have huge overhead).

SSE似乎有<一个href=\"http://www.sesp.cse.clrc.ac.uk/html/SoftwareTools/vtune/users%5Fguide/mergedProjects/analyzer%5Fec/mergedProjects/reference%5Folh/mergedProjects/instructions/instruct32%5Fhh/vc241.htm\">PMULHUW, 16×16 - 这个>前16位版本,但没有一个32×32 - >前32的版本就像我在寻找

SSE seems to have PMULHUW, a 16x16 -> top 16 bit version of this, but not a 32x32 -> top 32 version like I'm looking for.

推荐答案

GCC 4.3.2,与-O1优化或更高,正是翻译的功能,你拿给IA32装配这样的:

gcc 4.3.2, with -O1 optimisation or higher, translated your function exactly as you showed it to IA32 assembly like this:

umulhi32:
        pushl   %ebp
        movl    %esp, %ebp
        movl    12(%ebp), %eax
        mull    8(%ebp)
        movl    %edx, %eax
        popl    %ebp
        ret

这仅仅是做一个单一的32位的马尔,并把结果的高32位(从%EDX )插入的返回值。

Which is just doing a single 32 bit mull and putting the high 32 bits of the result (from %edx) into the return value.

这就是你想要的东西,对不对?听起来像是你只需要打开了优化你的编译器;),这是可能的,你可以通过减少中间变量推编译器在正确的方向:

That's what you wanted, right? Sounds like you just need to turn up the optimisation on your compiler ;) It's possible you could push the compiler in the right direction by eliminating the intermediate variable:

unsigned int umulhi32(unsigned int x, unsigned int y)
{
  return (unsigned int)(((unsigned long long)x * y)>>32);
}

这篇关于乘法的高位比特的有效计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆