有什么办法来写"模31"没有模/除法运算符? [英] Is there any way to write "mod 31" without modulus/division operators?

查看:220
本文介绍了有什么办法来写"模31"没有模/除法运算符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

入门一个数的模量可以很容易地无模运算或部门完成的,如果操作数是2的幂。在这种情况下,下式成立: x%的Y =(X &安培;(Y - 1))。这通常是许多体系很多高性能。可同为做模31

  INT mod31(int类型的){返回%31; };


解决方案

下面两种方法来解决这个问题。第一个采用通用的位变换技术,如果精心优化能击败硬件部门。另一种替代的鸿沟乘法,类似于 GCC 进行优化,而且是遥遥领先的最快的。底线是,没有多少点试图避免操作符的如果第二个参数是恒定的,因为 GCC 的得到它覆盖。 (也可能是其他的编译器也。)

以下功能是基于这样的事实: X 是相同的(模31),为的的基极 - 32位数×总和。这是因为 32 是没错 1模31 ,因此的任何电源32 1模31 。因此,在基32号的每个数字的位置有助于数字* 1的模31总和。而且很容易获得基32重新presentation:我们只取位5在时间

(像在这个答案其余函数,只会对非负工作X )。

 符号mod31(无符号X){
  无符号TMP;
  为(TMP = 0; X; X>> = 5){
    TMP + = X&放大器; 31;
  }
  //在这里,我们假设有x中至多160位
  TMP =(TMP>→5)+(TMP及31);
  返回TMP> = 31? TMP - 31:TMP;
}

有关特定的整数大小,你可以展开循环,并很可能击败分裂。 (而看到 @ chux的回答一种方式来循环转换成 O(日志位)操作,而不是 O(位)这是更加难打 GCC ,这避免了分频时股息是在编译时已知的常量。

在使用32位无符号整数一个非常快的标杆,天真展开的循环了19秒并根据@ chux的回答版本只用了13秒,但gcc的 x%的31 花9.7秒。强制GCC使用硬件除法器(通过非恒定师)担任23.4秒,并且上面显示的code拿了25.6秒。这些数字应该采取与盐的几种谷物。时代在计算 I%31 的所有可能值我,使用 -O3 -march =本地。

GCC 避免32位除以常数用什么本质上是不断随后右移的倒数64位乘法替换它。 (实际算​​法做更多的工作,以避免溢出。)的过程已经在 GCC 2.6版 20余年前实现,它描述了算法的纸张可在 GMP网站。 (GMP也使用这一招。)

下面是一个简化版本:假设我们要计算ñ// 31 对于某些32位无符号整数 N (使用Python的 // 来表示截断整数除法)。我们使用魔术常数 M = 2 32 // 31 ,这是 138547332 。现在,很显然,对于任何 N

M * N'LT; = 2 32 * N / 31 LT; M * N + N
→ M *ñ// 2 32 < = N // 31< =(M * N + N)// 2 32

(这里我们使用的事实,如果 A< b 然后楼(一)LT =楼(B)

此外,由于 N'LT; 2 32 M *ñ// 2 32 (M * N + N)// 2 32 要么是相同的整数或连续两个整数。因此,这两个中的一个(或两者)的实际值ñ// 31

现在,我们真的要计算 N%31 。因此,我们需要用31乘以(presumed)商,并减去从 N 。如果我们使用两种可能的商数的小,它可能是,所计算的模数值太大,但它只能是由31太大

或者,把它放在code:

 静态无符号很长很长的魔法= 138547332;
无符号mod31g(无符号X){
  无符号Q =(X *魔法)GT;> 32;
  //为31乘,我们乘以32和减法
  无符号MOD = X - ((Q&所述;小于5) - Q);
  返回MOD< 31? MOD:MOD - 31日;
}

GCC使用的实际算法采用基于 2 37 //乘以31 + 1 。总是产生正确的商数,但在一些额外的移位的成本,并增加了,以避免整数溢出。事实证明,该版本上面稍微快一些 - 在相同的基准如上,花了只有6.3秒。


其他基准比较功能,完整性:

天真展开循环

 符号mod31b(无符号X){
  无符号的TMP = X&放大器; 31; X - GT;> = 5;
  TMP + = X&放大器; 31; X - GT;> = 5;
  TMP + = X&放大器; 31; X - GT;> = 5;
  TMP + = X&放大器; 31; X - GT;> = 5;
  TMP + = X&放大器; 31; X - GT;> = 5;
  TMP + = X&放大器; 31; X - GT;> = 5;
  TMP + = X&放大器; 31;  TMP =(TMP>→5)+(TMP及31);
  返回TMP> = 31? TMP - 31:TMP;
}

@ chux的改进,优化略有

 静态常量无符号掩码1 =(31U<&小于0)| (31U&所述;小于10)| (31U&所述;&下; 20)| (31U&所述;小于30);
静态常量无符号MASK2 =(31U<< 5)| (31U<< 15)| (31U&所述;α25);
无符号mod31c(无符号X){
  X =(X安培;掩码1)+((X安培;掩码2)GT;大于5);
  X + = X>> 20;
  X + = X>> 10;  X =(X安培; 31)+((X - GT;大于5)及31);
  返回X> = 31?点¯x - 31:X;
}

Getting the modulus of a number can be easily done without the modulus operator or divisions, if your operand is a power of 2. In that case, the following formula holds: x % y = (x & (y − 1)). This is often many performant in many architectures. Can the same be done for mod 31?

int mod31(int a){ return a % 31; };

解决方案

Here are two ways to approach this problem. The first one using a common bit-twiddling technique, and if carefully optimized can beat hardware division. The other one substitutes a multiply for the divide, similar to the optimization performed by gcc, and is far and away the fastest. The bottom line is that there's not much point trying to avoid the % operator if the second argument is constant, because gcc's got it covered. (And probably other compilers, too.)

The following function is based on the fact that x is the same (mod 31) as the sum of the base-32 digits of x. That's true because 32 is 1 mod 31, and consequently any power of 32 is 1 mod 31. So each "digit" position in a base-32 number contributes the digit * 1 to the mod 31 sum. And it's easy to get the base-32 representation: we just take the bits five at a time.

(Like the rest of the functions in this answer, it will only work for non-negative x).

unsigned mod31(unsigned x) {
  unsigned tmp;
  for (tmp = 0; x; x >>= 5) {
    tmp += x & 31;
  }
  // Here we assume that there are at most 160 bits in x
  tmp = (tmp >> 5) + (tmp & 31);
  return tmp >= 31 ? tmp - 31 : tmp;
}

For a specific integer size, you could unroll the loop and quite possibly beat division. (And see @chux's answer for a way to convert the loop into O(log bits) operations instead of O(bits) It's more difficult to beat gcc, which avoids division when the dividend is a constant known at compile-time.

In a very quick benchmark using unsigned 32 bit integers, the naive unrolled loop took 19 seconds and a version based on @chux's answer took only 13 seconds, but gcc's x%31 took 9.7 seconds. Forcing gcc to use a hardware divide (by making the division non-constant) took 23.4 seconds, and the code as shown above took 25.6 seconds. Those figures should be taken with several grains of salt. The times are for computing i%31 for all possible values of i, on my laptop using -O3 -march=native.

gcc avoids 32-bit division by a constant by replacing it with what is essentially a 64-bit multiplication by the inverse of the constant followed by a right shift. (The actual algorithm does a bit more work to avoid overflows.) The procedure was implemented more than 20 years ago in gcc v2.6, and the paper which describes the algorithm is available on the gmp site. (GMP also uses this trick.)

Here's a simplified version: Say we want to compute n // 31 for some unsigned 32-bit integer n (using the pythonic // to indicate truncated integer division). We use the "magic constant" m = 232 // 31, which is 138547332. Now it's clear that for any n:

m * n <= 232 * n/31 < m * n + n ⇒ m * n // 232 <= n//31 <= (m * n + n) // 232

(Here we make use of the fact that if a < b then floor(a) <= floor(b).)

Furthermore, since n < 232, m * n // 232 and (m * n + n) // 232 are either the same integer or two consecutive integers. Consequently, one (or both) of those two is the actual value of n//31.

Now, we really want to compute n%31. So we need to multiply the (presumed) quotient by 31, and subtract that from n. If we use the smaller of the two possible quotients, it may turn out that the computed modulo value is too big, but it can only be too big by 31.

Or, to put it in code:

static unsigned long long magic = 138547332;
unsigned mod31g(unsigned x) {
  unsigned q = (x * magic) >> 32;
  // To multiply by 31, we multiply by 32 and subtract
  unsigned mod = x - ((q << 5) - q);
  return mod < 31 ? mod : mod - 31;
}

The actual algorithm used by gcc avoids the test at the end by using a slightly more accurate computation based on multiplying by 237//31 + 1. That always produces the correct quotient, but at the cost of some extra shifts and adds to avoid integer overflow. As it turns out, the version above is slightly faster -- in the same benchmark as above, it took only 6.3 seconds.


Other benchmarked functions, for completeness:

Naive unrolled loop

unsigned mod31b(unsigned x) {
  unsigned tmp = x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31;

  tmp = (tmp >> 5) + (tmp & 31);
  return tmp >= 31 ? tmp - 31 : tmp;
}

@chux's improvement, slightly optimized

static const unsigned mask1 = (31U << 0) | (31U << 10) | (31U << 20) | (31U << 30);
static const unsigned mask2 = (31U << 5) | (31U << 15) | (31U << 25);
unsigned mod31c(unsigned x) {
  x = (x & mask1) + ((x & mask2) >> 5);
  x += x >> 20;
  x += x >> 10;

  x = (x & 31) + ((x >> 5) & 31);
  return x >= 31 ? x - 31: x;
}

这篇关于有什么办法来写&QUOT;模31&QUOT;没有模/除法运算符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆