BigInteger的数字实现和性能 [英] BigInteger numbers implementation and performance

查看:284
本文介绍了BigInteger的数字实现和性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个BigInteger类在C ++中,应该能够做到操作上的任何大小的所有号码。目前,我想通过比较现有的算法和测试,其中数字量,他们工作最好,实现了非常快的乘方法,我遇到了很意外results.I试图做20乘法500位的,我计时他们。这是结果:

  Karatsuba的:
  14.178秒

长乘法:
  0.879秒
 

维基百科告诉我

  

由此可见,对于足够大的n,Karatsuba的算法将执行较少的转变和个位数的加法比手写乘法,尽管其基本步骤使用更多的加法和移位比简单的公式。为n个,然而,额外的移位小的值,并添加操作可以使其运行比普通写法方法慢。的正收益点取决于计算机平台和环境上。作为一个经验法则,Karatsuba的通常是当被乘数长度超过320-640位快。

由于我的数字是至少1500位长,这是一个相当意外的,因为维基百科说Karatsuba的应该跑得更快。我相信,我的问题可能是在我的另外的算法,但我不明白我怎么可以让它更快一点,因为它已经在O(n)的运行。生病后下方,这样你可以检查我实现我的code。我将离开这个不相关的部分出来。
我也在想,也许我所用的结构是不是最好的。我重新presented在小尾数每个数据段。因此,举例来说,如果我有一些123456789101112存入长度为3的数据段就应该是这样的:

  

{112,101,789,456,123}

所以这就是为什么我现在想问什么是最好的结构和最佳的方式来实现一个BigInteger类是?为什么比乘一个长的Karatsuba的算法慢?

这是我的code:(我很抱歉的长度)

 使用名字空间std;

布尔__longmult = TRUE;
布尔__karatsuba = FALSE;

结构BigInt有{
上市:
    矢量< int的>数字;

    BigInt有(为const char *号){
        //构造是不相关
    }
    BigInt有(){}

    无效BigInt有::运算符=(BigInt有一){
        位数= a.digits;
    }

    朋友BigInt有运营商+(BigInt有,BigInt有);
    朋友BigInt有运营商*(BigInt有,BigInt有);

    朋友的ostream&放大器;运营商的LT;< (ostream的&放大器;,BigInt有);
};

BigInt有运营商+(BigInt有一个,BigInt有B){
    如果(b.digits.size()> a.digits.size()){
        a.digits.swap(b.digits); //确保一个有更多的或等量的数字比乙
    }
    INT携带= 0;

    为(unsigned int类型I = 0; I< a.digits.size();我++){
        INT总和;
        如果(ⅰ&其中; b.digits.size()){
            总和= b.digits [I] + a.digits [I] +随身携带;
        }否则,如果(携带== 1){
            总和= a.digits [I] +随身携带;
        } 其他 {
            打破; //如果进位为0,b中没有更多的数字是左,那么我们已经完成
        }

        如果(总和> = 10亿){
            a.digits [i] = SUM-10亿;
            随身携带= 1;
        } 其他 {
            a.digits [我] =总和;
            随身携带= 0;
        }
    }

    如果(进){
        a.digits.push_back(1);
    }

    返回;
}

BigInt有符*(BigInt有一个,BigInt有B){
    如果(__longmult){
        BigInt有资源;
        为(unsigned int类型I = 0; I< b.digits.size();我++){
            BigInt有温度;
            temp.digits.insert(temp.digits.end(),我,0); //转向离开我'数字'

            INT携带= 0;
            为(unsigned int类型J = 0; J< a.digits.size(); J ++){
                长长的督促= b.digits [I]
                督促* = a.digits [J]。
                督促+ =携带;
                INT T =督促%10亿;
                temp.digits.push_back(T);
                携带=(PROD-T)/ 10亿;
            }
            如果(进大于0){
                temp.digits.push_back(进);
            }
            RES + =温度;
        }
        返回水库;
    }否则,如果(__karatsuba){
        BigInt有资源;
        BigInt有A1,A0,B1,B0;
        断言(a.digits.size()大于0&安培;&安培; b.digits.size()大于0);
        而(a.digits.size()!= b.digits.size()){//添加零同等大小
            如果(a.digits.size()> b.digits.size()){
                b.digits.push_back(0);
            } 其他 {
                a.digits.push_back(0);
            }
        }

        如果(a.digits.size()== 1){
            长长的督促= a.digits [0];
            刺* = b.digits [0];

            RES = PROD; //从长长转换为bigint在固定时间内运行
            返回水库;

        } 其他 {
            为(unsigned int类型I = 0; I< a.digits.size();我++){
                如果(ⅰ≤(a.digits.size()+(a.digits.size()及1))/ 2){//分成2等份的数量
                    a0.digits.push_back(a.digits [I]);
                    b0.digits.push_back(b.digits [I]);
                } 其他 {
                    a1.digits.push_back(a.digits [I]);
                    b1.digits.push_back(b.digits [I]);
                }
            }
        }

        BigInt有Z2 = A1 * B1;
        BigInt有Z0 = A0 * B0;
        BigInt有Z1 =(A1 + A0)*(B1 + B0) -  Z2  -  Z0;

        如果(Z2 == 0安培;&安培; Z1 == 0){
            RES = Z0;
        }否则,如果(Z2 == 0){
            z1.digits.insert(z1.digits.begin(),a0.digits.size(),0);
            RES = Z1 + Z0;
        } 其他 {
            z1.digits.insert(z1.digits.begin(),a0.digits.size(),0);
            z2.digits.insert(z2.digits.begin(),2 * a0.digits.size(),0);
            RES = Z2 + Z1 + Z0;
        }

        返回水库;
    }
}

诠释的main(){
    clock_t表示起点,终点;

    BigInt有a("984561231354629875468546546534125215534125215634987498548489456125215421563498749854848945612385663498749854848945612521542156349874985484894561238561698774565123165221393856169877456512316552156349874985484894561238561698774565123165221392213935215634987498548489456123856169877456512316522139521563498749854848945612385616987745651231652213949651465123151354686324848945612385616987745651231652213949651465123151354684132319321005482265341252156349874985484894561252154215634987498548489456123856264596162131");
    BigInt有b("453412521563498749853412521563498749854848945612521542156349874985484894561238565484894561252154215634987498548489456123856848945612385616935462987546854521563498749854848945653412521563498749854848945612521542156349874985484894561238561238754579785616987745651231652213965465341235215634987495215634987498548489456123856169877456512316522139854848774565123165223546298754685465465341235215634987498548354629875468546546534123521563498749854844139496514651231513546298754685465465341235215634987498548435468");

    __longmult = FALSE;
    __karatsuba = TRUE;

    启动=时钟();
    的for(int i = 0;我小于20;我++){
        A * B;
    }
    结束=时钟();
    的printf(\ nTook%F秒的\ n,(双)(完启动)/ CLOCKS_PER_SEC);

    __longmult = TRUE;
    __karatsuba = FALSE;

    启动=时钟();
    的for(int i = 0;我小于20;我++){
        A * B;
    }
    结束=时钟();
    的printf(\ nTook%F秒的\ n,(双)(完启动)/ CLOCKS_PER_SEC);

    返回0;
}
 

解决方案

1 您使用std ::矢量

  • 为您的数字
  • 确保没有任何不必要的重新分配它
  • 在操作之前,所以分配空间,以避免它...
  • 也是我不使用它,所以我不知道数组范围检查放缓...
  • ,并检查,如果你不移动它!
  • 这是O(N)...即插入到第一的位置...

2 优化实施

  • 这里你可以找到我的执行
  • 而优化的未优化的比较

      X = 0.98765588997654321000000009876 ... | 98 * 32位...
    MUL1 [363.472毫秒] ... O(N ^ 2)经典乘
    MUL2 [349.384毫秒] ... O(3 *(N ^ LOG2(3)))优化Karatsuba的乘
    MUL3 [9345.127毫秒] ... O(3 *(N ^ LOG2(3)))未经优化的Karatsuba的乘
     

  • 矿山实施treshold的Karatsuba的是围绕3100bits ...〜944位!

  • c中的情人treshold更优化的$ C $是。
  • 试图从功能操作数删除不需要的数据

      // BigInt有运营商+(BigInt有一个,BigInt有二)
    BigInt有运营商+(常量BigInt有&功放;一,常量BigInt有和b)
     

  • 这是你会不会在每一个+呼叫建立的A,B堆另一个副本的方式

  • 也更快是这样的:

      MUL(BigInt有&安培; AB,常量BigInt有&功放;一,常量BigInt有和b)// AB = A * B
     

3 Schönhage-Strassen的乘法

  • 在这一个FFT或NTT基于
  • 矿山treshold因为这是大...〜49700bits ...〜15000digits
  • 所以,如果你不打算使用这么大的数字,然后忘掉它
  • 的实施也在上面
  • 链接
  • 这里是我的NTT实现(优化,就像我可以)

4 摘要

  • 如果您使用很少或大端无所谓
    • 但你应该$ C C的操作$
    • 在一种方式,他们不使用的插入操作
  • 您使用脉冲基地位数

    • 这是缓慢的,因为你需要使用除法和模运算
    • 如果您选择基地的功率2 ,然后刚咬了操作是足够
    • 也将删除不少,如果从code这是放缓了所有最
    • 语句
    • 如果您需要在基地的10电源,然后用最大的您可以
    • 在某些情况下,这降低了DIV,MOD到几substractions

        2 ^ 32 = 4 294 967 296 ... INT = +/- 2147483648
      基地= 1 000 000 000
      
      // X%=基地
      而(X> =基地)x =基地;
       

    • 周期
    • 最大数量为2 ^ 32 /碱或2 ^ 31 /基

    • 在某些平台上是这样的速度,然后模
  • 也越大基础,你需要少操作
    • 但要注意溢出!

I have written a BigInteger class in C++ that should be able to do operations on all numbers with any size. Currently I am trying to achieve a very fast multiplication method by comparing the existing algorithms and test for which amount of digits they work best and I ran into very unexpected results.I tried to do 20 multiplications of 500-digit and I timed them. This was the result:

karatsuba:
  14.178 seconds

long multiplication:
  0.879 seconds

Wikipedia told me

It follows that, for sufficiently large n, Karatsuba's algorithm will perform fewer shifts and single-digit additions than longhand multiplication, even though its basic step uses more additions and shifts than the straightforward formula. For small values of n, however, the extra shift and add operations may make it run slower than the longhand method. The point of positive return depends on the computer platform and context. As a rule of thumb, Karatsuba is usually faster when the multiplicands are longer than 320–640 bits.

Since my numbers are at least 1500 bits long this is quite unexpected because wikipedia said karatsuba should run faster. I believe that my problem might be in my addition algorithm but I don't see how I could make it faster because it's already running in O(n). Ill post my code below so that you can check my implementations. I'll leave the irrelevant parts out of it.
I was also thinking that maybe the structure I used was not the best possible. I represented each data segment in little endian. So for example if I have the number 123456789101112 stored into data segments of length 3 it would look like this:

{112,101,789,456,123}

so this is why I am asking now what the best structure and best way to implement a BigInteger class is? And why is the karatsuba algorithm slower than the long multiplication one ?

This is my code: (I'm sorry for the length)

using namespace std;

bool __longmult=true;
bool __karatsuba=false;

struct BigInt {
public:
    vector <int> digits;

    BigInt(const char * number) {
        //constructor is not relevant   
    }
    BigInt() {}

    void BigInt::operator = (BigInt a) {
        digits=a.digits;
    }

    friend BigInt operator + (BigInt,BigInt);
    friend BigInt operator * (BigInt,BigInt);

    friend ostream& operator << (ostream&,BigInt);
};

BigInt operator + (BigInt a,BigInt b) {
    if (b.digits.size()>a.digits.size()) {
        a.digits.swap(b.digits); //make sure a has more or equal amount of digits than b
    }
    int carry=0;

    for (unsigned int i=0;i<a.digits.size();i++) {
        int sum;
        if (i<b.digits.size()) {
            sum=b.digits[i]+a.digits[i]+carry;
        } else if (carry==1) {
            sum=a.digits[i]+carry;
        } else {
            break; // if carry is 0 and no more digits in b are left then we are done already
        }

        if (sum>=1000000000) {
            a.digits[i]=sum-1000000000;
            carry=1;
        } else {
            a.digits[i]=sum;
            carry=0;
        }
    }

    if (carry) {
        a.digits.push_back(1);
    }

    return a;
}

BigInt operator * (BigInt a,BigInt b) {
    if (__longmult) {
        BigInt res;
        for (unsigned int i=0;i<b.digits.size();i++) {
            BigInt temp;
            temp.digits.insert(temp.digits.end(),i,0); //shift to left for i 'digits'

            int carry=0;
            for (unsigned int j=0;j<a.digits.size();j++) {
                long long prod=b.digits[i];
                prod*=a.digits[j];
                prod+=carry;
                int t=prod%1000000000;
                temp.digits.push_back(t);
                carry=(prod-t)/1000000000;
            }
            if (carry>0) {
                temp.digits.push_back(carry);
            }
            res+=temp;
        }
        return res;
    } else if (__karatsuba) {
        BigInt res;
        BigInt a1,a0,b1,b0;
        assert(a.digits.size()>0 && b.digits.size()>0);
        while (a.digits.size()!=b.digits.size()) { //add zeroes for equal size
            if (a.digits.size()>b.digits.size()) {
                b.digits.push_back(0);
            } else {
                a.digits.push_back(0);
            }
        }

        if (a.digits.size()==1) {
            long long prod=a.digits[0];
            prod*=b.digits[0];

            res=prod;//conversion from long long to BigInt runs in constant time
            return res;

        } else {
            for (unsigned int i=0;i<a.digits.size();i++) {
                if (i<(a.digits.size()+(a.digits.size()&1))/2) { //split the number in 2 equal parts
                    a0.digits.push_back(a.digits[i]);
                    b0.digits.push_back(b.digits[i]);
                } else {
                    a1.digits.push_back(a.digits[i]);
                    b1.digits.push_back(b.digits[i]);
                }
            }
        }

        BigInt z2=a1*b1;
        BigInt z0=a0*b0;
        BigInt z1 = (a1 + a0)*(b1 + b0) - z2 - z0;

        if (z2==0 && z1==0) {
            res=z0;
        } else if (z2==0) {
            z1.digits.insert(z1.digits.begin(),a0.digits.size(),0);
            res=z1+z0;
        } else {
            z1.digits.insert(z1.digits.begin(),a0.digits.size(),0);
            z2.digits.insert(z2.digits.begin(),2*a0.digits.size(),0);
            res=z2+z1+z0;
        }

        return res;
    }
}

int main() {
    clock_t start, end;

    BigInt a("984561231354629875468546546534125215534125215634987498548489456125215421563498749854848945612385663498749854848945612521542156349874985484894561238561698774565123165221393856169877456512316552156349874985484894561238561698774565123165221392213935215634987498548489456123856169877456512316522139521563498749854848945612385616987745651231652213949651465123151354686324848945612385616987745651231652213949651465123151354684132319321005482265341252156349874985484894561252154215634987498548489456123856264596162131");
    BigInt b("453412521563498749853412521563498749854848945612521542156349874985484894561238565484894561252154215634987498548489456123856848945612385616935462987546854521563498749854848945653412521563498749854848945612521542156349874985484894561238561238754579785616987745651231652213965465341235215634987495215634987498548489456123856169877456512316522139854848774565123165223546298754685465465341235215634987498548354629875468546546534123521563498749854844139496514651231513546298754685465465341235215634987498548435468");

    __longmult=false;
    __karatsuba=true;

    start=clock();
    for (int i=0;i<20;i++) {
        a*b;
    }
    end=clock();
    printf("\nTook %f seconds\n", (double)(end-start)/CLOCKS_PER_SEC);

    __longmult=true;
    __karatsuba=false;

    start=clock();
    for (int i=0;i<20;i++) {
        a*b;
    }
    end=clock();
    printf("\nTook %f seconds\n", (double)(end-start)/CLOCKS_PER_SEC);

    return 0;
}

解决方案

1.You use std::vector

  • for your digits
  • make sure there are no unnecessary reallocations in it
  • so allocate space before operation to avoid it ...
  • also I do not use it so I do not know the array range checking slowdowns ...
  • and also check if you do not shift it !!!
  • which is O(N) ... ie insert to first position ...

2.Optimize your implementation

  • here you can find mine implementation
  • optimized an unoptimized for comparison

    x=0.98765588997654321000000009876... | 98*32 bits ...
    mul1[ 363.472 ms ] ... O(N^2) classic multiplication
    mul2[ 349.384 ms ] ... O(3*(N^log2(3))) optimized karatsuba multiplication
    mul3[ 9345.127 ms] ... O(3*(N^log2(3))) unoptimized karatsuba multiplication 
    

  • mine implementation treshold for karatsuba is around 3100bits ... ~ 944 digits!!!

  • the more optimized the code the lover treshold is.
  • try to remove unnecessary data from function operands

    //BigInt operator + (BigInt a,BigInt b)
    BigInt operator + (const BigInt &a,const BigInt &b)
    

  • this is way you will not create another copy of a,b on heap in every + call

  • also even faster is this:

    mul(BigInt &ab,const BigInt &a,const BigInt &b) // ab = a*b
    

3.Schönhage-Strassen multiplication

  • this one is FFT or NTT based
  • mine treshold for it is big ... ~ 49700bits ... ~ 15000digits
  • so if you do not plan to use such big numbers then forget about it
  • implementation is also in the link above
  • here is mine NTT implementation (optimized as much as I could)

4.Summary

  • does not matter if you use little or big endian
    • but you should code your operations
    • in a way that they do not use insert operations
  • you use decadic base for digits

    • this is slow because you need to use division and modulo operations
    • if you choose base as power of 2 then just bit operations are enough
    • and also it removes many if statements from code which are slowing all the most
    • if you need the base as power of 10 then use biggest you can
    • in some cases this reduce the div,mod to few substractions

      2^32 = 4 294 967 296 ... int = +/- 2147483648
      base = 1 000 000 000
      
      //x%=base
      while (x>=base) x-=base;
      

    • max number of cycles is 2^32/base or 2^31/base

    • on some platform is this faster then modulo
  • also the bigger the base the less operations you need
    • but beware the overflows !!!

这篇关于BigInteger的数字实现和性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆