为x86-64平台上的C(++)中的64位无符号参数计算(a * b)%n FAST? [英] Compute (a*b)%n FAST for 64-bit unsigned arguments in C(++) on x86-64 platforms?
问题描述
我正在寻找一种快速方法来有效地计算( a
⋅ b
)模( n
)(从数学意义上来说)是 uint64_t
类型的 a
, b
, n
.我可以接受诸如 n!= 0
或什至 a< n&&b< n
.
I'm looking for a fast method to efficiently compute (a
⋅b
) modulo n
(in the mathematical sense of that) for a
, b
, n
of type uint64_t
. I could live with preconditions such as n!=0
, or even a<n && b<n
.
请注意,C表达式(a * b)%n
不会将其剪切,因为该乘积被截断为64位.我正在寻找(uint64_t)((((uint128_t)a * b)%n)
,除了我没有 uint128_t
(我知道,在Visual C ++中).
Notice that the C expression (a*b)%n
won't cut it, because the product is truncated to 64 bits. I'm looking for (uint64_t)(((uint128_t)a*b)%n)
except that I do not have a uint128_t
(that I know, in Visual C++).
我希望使用Visual C ++或GCC/clang内在函数,以便最好地利用x86-64平台上可用的基础硬件;或者对于便携式 inline
函数无法做到的话.
I'm in for a Visual C++ (preferably) or GCC/clang intrinsic making best use of the underlying hardware available on x86-64 platforms; or if that can't be done for a portable inline
function.
推荐答案
7年后,我得到了在Visual Studio 2019中工作的解决方案
7 years later, I got a solution working in Visual Studio 2019
#include <stdint.h>
#include <intrin.h>
#pragma intrinsic(_umul128)
#pragma intrinsic(_udiv128)
// compute (a*b)%n with 128-bit intermediary result
// assumes n>0 and a*b < n * 2**64 (always the case when a<=n || b<=n )
inline uint64_t mulmod(uint64_t a, uint64_t b, uint64_t n) {
uint64_t r, s = _umul128(a, b, &r);
(void)_udiv128(r, s, n, &r);
return r;
}
// compute (a*b)%n with 128-bit intermediary result
// assumes n>0, works including if a*b >= n * 2**64
inline uint64_t mulmod1(uint64_t a, uint64_t b, uint64_t n) {
uint64_t r, s = _umul128(a % n, b, &r);
(void)_udiv128(r, s, n, &r);
return r;
}
这篇关于为x86-64平台上的C(++)中的64位无符号参数计算(a * b)%n FAST?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!