乘法的高位比特的有效计算 [英] Efficient computation of the high order bits of a multiplication

查看：200 发布时间：2016/8/18 14:35:44 c++ c optimization

本文介绍了乘法的高位比特的有效计算的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有多个CPU，单集运codeS为返回的高为了一个32位整数乘法位。一般情况下相乘两个32位整数产生64位的结果，但如果你将其存储在一个32位整数，这是截断为低32位。

Many CPUs have single assembly opcodes for returning the high order bits of a 32 bit integer multiplication. Normally multiplying two 32 bit integers produces a 64 bit result, but this is truncated to the low 32 bits if you store it in a 32 bit integer.

例如，在PowerPC上，<一个href=\"http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/com.ibm.aix.aixassem/doc/alangref/mulhw.htm\">mulhw运code返回乘以一个时钟一个32×32位的64位结果的高32位。这正是我要找的，但更多的可移植性。有一个类似的运算code，umulhi（），在NVIDIA CUDA。

For example, on PowerPC, the mulhw opcode returns the high 32 bits of the 64 bit result of a 32x32 bit multiply in one clock. This is exactly what I'm looking for, but more portably. There's a similar opcode, umulhi(), in NVidia CUDA.

在C / C ++，有没有返回32x32乘法的高位一种有效的方式？
目前，我通过强制转换为64位计算的，是这样的：

In C/C++, is there an efficient way to return the high order bits of the 32x32 multiply? Currently I compute it by casting to 64 bits, something like:

unsigned int umulhi32(unsigned int x, unsigned int y)
{
  unsigned long long xx=x;
  xx*=y;
  return (unsigned int)(xx>>32);
}

但这是32乘以比普通32慢了11倍，因为我使用的是大材小用64位数学甚至是乘法。

but this is over 11 times slower than a regular 32 by 32 multiply because I'm using overkill 64 bit math even for the multiply.

有没有计算高位更快的方法？

Is there a faster way to compute the high order bits?

这是显然的不可以与一个BigInteger库（这是矫枉过正，将有巨大的开销）最好的解决。

This is clearly not best solved with a BigInteger library (which is overkill and will have huge overhead).

SSE似乎有<一个href=\"http://www.sesp.cse.clrc.ac.uk/html/SoftwareTools/vtune/users%5Fguide/mergedProjects/analyzer%5Fec/mergedProjects/reference%5Folh/mergedProjects/instructions/instruct32%5Fhh/vc241.htm\">PMULHUW, 16×16 - 这个>前16位版本，但没有一个32×32 - >前32的版本就像我在寻找

SSE seems to have PMULHUW, a 16x16 -> top 16 bit version of this, but not a 32x32 -> top 32 version like I'm looking for.

推荐答案

GCC 4.3.2，与-O1优化或更高，正是翻译的功能，你拿给IA32装配这样的：

gcc 4.3.2, with -O1 optimisation or higher, translated your function exactly as you showed it to IA32 assembly like this:

umulhi32:
        pushl   %ebp
        movl    %esp, %ebp
        movl    12(%ebp), %eax
        mull    8(%ebp)
        movl    %edx, %eax
        popl    %ebp
        ret

这仅仅是做一个单一的32位的马尔，并把结果的高32位（从％EDX ）插入的返回值。

Which is just doing a single 32 bit mull and putting the high 32 bits of the result (from %edx) into the return value.

这就是你想要的东西，对不对？听起来像是你只需要打开了优化你的编译器;），这是可能的，你可以通过减少中间变量推编译器在正确的方向：

That's what you wanted, right? Sounds like you just need to turn up the optimisation on your compiler ;) It's possible you could push the compiler in the right direction by eliminating the intermediate variable:

unsigned int umulhi32(unsigned int x, unsigned int y)
{
  return (unsigned int)(((unsigned long long)x * y)>>32);
}

这篇关于乘法的高位比特的有效计算的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

乘法的高位比特的有效计算 [英] Efficient computation of the high order bits of a multiplication

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

乘法的高位比特的有效计算 [英] Efficient computation of the high order bits of a multiplication

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭