我应该使用高性能大整数除法算法是什么? [英] What algorithm should I use for high-performance large integer division?

查看:335
本文介绍了我应该使用高性能大整数除法算法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在编码大整数到为size_t 的数组。我已经有其他业务工作(加,减,乘);以及由单个数字除法。不过,我想匹配我乘算法如果可能的话(目前TOOM库克)的时间复杂度。

I am encoding large integers into an array of size_t. I already have the other operations working (add, subtract, multiply); as well as division by a single digit. But I would like match the time complexity of my multiplication algorithms if possible (currently Toom-Cook).

我收集有采取我的红利乘法逆的各种概念线性时间的算法。这意味着我可以在理论上实现师在同一时间复杂度为我乘法,因为线性时间的操作是微不足道的比较呢。

I gather there are linear time algorithms for taking various notions of multiplicative inverse of my dividend. This means I could theoretically achieve division in the same time complexity as my multiplication, because the linear-time operation is "insignificant" by comparison anyway.

我的问题是,如何真正做到这一点?什么类型的乘法逆是最好的做法呢?模 64 ^ digitcount ?当我用我的除数乘以乘法逆,可我推脱计算,将被扔掉,由于整数截断数据的一部分吗?任何人都可以提供C或C ++伪code或给出这个应该怎么做了precise解释?

My question is, how do I actually do that? What type of multiplicative inverse is best in practice? Modulo 64^digitcount? When I multiply the multiplicative inverse by my divisor, can I shirk computing the part of the data that would be thrown away due to integer truncation? Can anyone provide C or C++ pseudocode or give a precise explanation of how this should be done?

或者是有一个专门的除法算法,它比反为基础的方法更好呢?

Or is there a dedicated division algorithm that is even better than the inverse-based approach?

编辑:我挖了我在那里得到上面提到的逆的方法。在312页的计算机程序设计艺术,第2卷:半数值算法,克努特提供了算法R这是一个高precision倒数。他说,它的时间复杂度小于乘法。它是,但是,平凡的将其转换为C和测试出来,还不清楚有多少内存开销等,将被消耗,直到我$ C C此$起来,这将需要一段时间。我会后,如果没人打我吧。

I dug up where I was getting "inverse" approach mentioned above. On page 312 of "Art of Computer Programming, Volume 2: Seminumerical Algorithms", Knuth provides "Algorithm R" which is a high-precision reciprocal. He says its time complexity is less than that of multiplication. It is, however, nontrivial to convert it to C and test it out, and unclear how much overhead memory, etc, will be consumed until I code this up, which would take a while. I'll post it if no one beats me to it.

推荐答案

我不知道乘法逆算法,但它听起来像的 Montgomery简化或巴雷特的减少。

I do not know the multiplicative inverse algorithm but it sounds like modification of Montgomery Reduction or Barrett's Reduction.

我做BIGINT师有点不同。

请参阅 BIGNUM师。特别是看看逼近分和2个链接存在。一个是我的固定点分,其他的是快速乘法交易算法(如Karatsuba的,Schönhage-Strassen的对NTT)与测量,并链接到我的速度非常快NTT实现32位的基础。

See bignum division. Especially take a look at the approximation divider and the 2 links there. One is my fixed point divider and the others are fast multiplication algos (like karatsuba,Schönhage-Strassen on NTT) with measurements, and a link to my very fast NTT implementation for 32bit Base.

我不知道,如果逆multiplicant是这样的。

I'm not sure if the inverse multiplicant is the way.

有主要用于模运算,其中隔板是恒定的。恐怕任意分裂的时间和操作所需收购BIGINT逆可以更大然后是标准的部门本身,而是因为我不熟悉的我可能是错的。

It is mostly used for modulo operation where the divider is constant. I'm afraid that for arbitrary divisions the time and operations needed to acquire bigint inverse can be bigger then the standard divisions itself, but as I am not familiar with it I could be wrong.

我在implemetations看到的最常见的分配器在使用中是牛顿 - 拉夫逊分裂,这是非常相似的逼近分频器在上面的链接

The most common divider in use I saw in implemetations are Newton–Raphson division which is very similar to approximation divider in the link above.

逼近/迭代分频器通常用乘法来算它定义自己的速度。

Approximation/iterative dividers usually use multiplication which define their speed.

有关足够小的数目通常是长二进制除法和32位/ 64位数字基地分工不够快,如果不是最快的:他们通常有小的开销,并让 N 是最大值处理(数字不是数字!)

For small enough numbers is usually long binary division and 32/64bit digit base division fast enough if not fastest: usually they have small overhead, and let n be the max value processed (not the number of digits!)

二进制除法例如:

O(log32(N).log2(N))= O(日志^ 2(N))
这一切都显著位依次通过。在每次迭代需要比较,分,加,bitshift 。每个这样的操作都可以在 log32做(N) LOG2(N)是位数。

Is O(log32(n).log2(n)) = O(log^2(n)).
It loops through all significant bits. In each iteration you need to compare, sub, add, bitshift. Each of those operations can be done in log32(n), and log2(n) is the number of bits.

这是我的BIGINT模板(C ++),一个在这里的例子二元划分:

Here example of binary division from one of my bigint templates (C++):

template <DWORD N> void uint<N>::div(uint &c,uint &d,uint a,uint b)
    {
    int i,j,sh;
    sh=0; c=DWORD(0); d=1;
    sh=a.bits()-b.bits();
    if (sh<0) sh=0; else { b<<=sh; d<<=sh; }
    for (;;)
        {
        j=geq(a,b);
        if (j)
            {
            c+=d;
            sub(a,a,b);
            if (j==2) break;
            }
        if (!sh) break;
        b>>=1; d>>=1; sh--;
        }
    d=a;
    }

N 是一个32位的号码 DWORD 取值用于存储BIGINT数。

N is the number of 32 bit DWORDs used to store a bigint number.

  • C = A / B
  • D = A%B
  • QEQ(A,B)是一个比较: A&GT; = B 大于或等于(中完成 log32(N)= N
    它返回 0 A&LT; b 1 A&GT; b 2 A == b
  • 分(C,A,B) C = A - B
  • c = a / b
  • d = a % b
  • qeq(a,b) is a comparison: a >= b greater or equal (done in log32(n)=N)
    It returns 0 for a < b, 1 for a > b, 2 for a == b
  • sub(c,a,b) is c = a - b

这个速度是从,这并不用乘法来算(如果不计算位移位)升

The speed boost is gained from that this does not use multiplication (if you do not count the bit shift)

如果您使用数字与大基地如2 ^ 32(ALU块),那么你可以重写整个多项式喜欢使用32位版本的ALU操作风格。
这通常是更快然后二进制长除法,这个想法是处理每一个DWORD作为一个单一的数字,或者递归一半划分中的运算,直到撞到CPU的能力。
请参见除以半位宽算术

If you use digit with a big base like 2^32 (ALU blocks), then you can rewrite the whole in polynomial like style using 32bit build in ALU operations.
This is usually even faster then binary long division, the idea is to process each DWORD as a single digit, or recursively divide the used arithmetic by half until hit the CPU capabilities.
See division by half-bitwidth arithmetics

在所有的,虽然计算顶部大数

On top of all that while computing with bignums

如果你有优化的基本操作,那么复杂性可以进一步为子查询结果与迭代较小的(变化的基本操作的复杂性)这方面的一个很好的例子是NTT基于乘法。​​降低

If you have optimized basic operations, then the complexity can lower even further as sub-results get smaller with iterations (changing the complexity of basic operations) A nice example of that are NTT based multiplications.

开销可以搞砸的事情了。

The overhead can mess thing up.

由于这种运行时有时不能复制大O的复杂性,所以你应该经常测量tresholds和使用更快的方法来使用的比特数,以获得最大的性能和优化一下就可以了。

Due to this the runtime sometimes does not copy the big O complexity, so you should always measure the tresholds and use faster approach for used bit-count to get the max performance and optimize what you can.

这篇关于我应该使用高性能大整数除法算法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆