BigInteger数字实现和性能 [英] BigInteger numbers implementation and performance

查看:237
本文介绍了BigInteger数字实现和性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在C ++中编写了一个BigInteger类,应该能够对任何大小的所有数字执行操作。目前,我试图实现一个非常快速的乘法方法,通过比较现有的算法和测试的数量,他们最好的工作,我遇到了非常意想不到的结果。我试图做20次乘法500位数,我定时他们。这是结果:

  karatsuba:
14.178秒

长乘法:
0.879秒

维基百科告诉我



< >

接下来,对于足够大的n,Karatsuba的算法将执行较少的移位和单数加法,即使其基本步骤使用比直接公式更多的加法和移位。对于小的n值,然而,额外的移位和添加操作可能使它运行速度慢于longhand方法。正回报的点取决于计算机平台和上下文。根据经验,当被乘数长于320-640位时,Karatsuba通常更快。


1500位长这是相当意外,因为维基百科说,karatsuba应该运行得更快。我相信我的问题可能在我的添加算法,但我不明白我如何可以使它更快,因为它已经在O(n)中运行。我发布我的代码下面,让你可以检查我的实现。我会离开不相关的部分。

我也在想,也许我使用的结构不是最好的。我用小端表示每个数据段。例如,如果我将数字123456789101112存储到长度为3的数据段中,它将如下所示:


{112,101,789,456,123}


所以这就是为什么我现在问现在最好的结构和最好的方式来实现一个BigInteger类是?为什么karatsuba算法比长乘法运算慢?



这是我的代码:(对不起长度)

  using namespace std; 

bool __longmult = true;
bool __karatsuba = false;

struct BigInt {
public:
vector< int>数字

BigInt(const char * number){
//构造函数不相关
}
BigInt(){}

void BigInt :: operator =(BigInt a){
digits = a.digits;
}

friend BigInt operator +(BigInt,BigInt);
friend BigInt operator *(BigInt,BigInt);

friend ostream&运算符<< (ostream&,BigInt);
};

BigInt运算符+(BigInt a,BigInt b){
if(b.digits.size()> a.digits.size()){
a.digits .swap(b.digits); //确保a具有多于或等于数字的数量b
}
int carry = 0;

for(unsigned int i = 0; i< a.digits.size(); i ++){
int sum;
if(i< b.digits.size()){
sum = b.digits [i] + a.digits [i] + carry;
} else if(carry == 1){
sum = a.digits [i] + carry;
} else {
break; //如果carry为0,并且b中没有更多的数字,则我们已经完成了
}

if(sum> = 1000000000){
a.digits [i] = sum-1000000000;
carry = 1;
} else {
a.digits [i] = sum;
carry = 0;
}
}

if(carry){
a.digits.push_back(1);
}

return a;
}

BigInt运算符*(BigInt a,BigInt b){
if(__longmult){
BigInt res;
for(unsigned int i = 0; i< b.digits.size(); i ++){
BigInt temp;
temp.digits.insert(temp.digits.end(),i,0); // shift to left for i'digits'

int carry = 0;
for(unsigned int j = 0; j< a.digits.size(); j ++){
long long prod = b.digits [i];
prod * = a.digits [j];
prod + = carry;
int t = prod%1000000000;
temp.digits.push_back(t);
carry =(prod-t)/ 1000000000;
}
if(carry> 0){
temp.digits.push_back(carry);
}
res + = temp;
}
return res;
} else if(__karatsuba){
BigInt res;
BigInt a1,a0,b1,b0;
assert(a.digits.size()> 0&& b.digits.size()> 0);
while(a.digits.size()!= b.digits.size()){//为等大小添加零值
if(a.digits.size()> b.digits。 size()){
b.digits.push_back(0);
} else {
a.digits.push_back(0);
}
}

if(a.digits.size()== 1){
long long prod = a.digits [0];
prod * = b.digits [0];

res = prod; //从long long到BigInt的转换在常量时间运行
return res;

} else {
for(unsigned int i = 0; i if(i<(a.digits.size ()+(a.digits.size()& 1))/ 2){//以2个相等的部分拆分数字
a0.digits.push_back(a.digits [i]);
b0.digits.push_back(b.digits [i]);
} else {
a1.digits.push_back(a.digits [i]);
b1.digits.push_back(b.digits [i]);
}
}
}

BigInt z2 = a1 * b1;
BigInt z0 = a0 * b0;
BigInt z1 =(a1 + a0)*(b1 + b0)-z2-z0;

if(z2 == 0&& z1 == 0){
res = z0;
} else if(z2 == 0){
z1.digits.insert(z1.digits.begin(),a0.digits.size(),0);
res = z1 + z0;
} else {
z1.digits.insert(z1.digits.begin(),a0.digits.size(),0);
z2.digits.insert(z2.digits.begin(),2 * a0.digits.size(),0);
res = z2 + z1 + z0;
}

return res;
}
}

int main(){
clock_t start,end;

BIGINT一个(984561231354629875468546546534125215534125215634987498548489456125215421563498749854848945612385663498749854848945612521542156349874985484894561238561698774565123165221393856169877456512316552156349874985484894561238561698774565123165221392213935215634987498548489456123856169877456512316522139521563498749854848945612385616987745651231652213949651465123151354686324848945612385616987745651231652213949651465123151354684132319321005482265341252156349874985484894561252154215634987498548489456123856264596162131);
BigInt有B(453412521563498749853412521563498749854848945612521542156349874985484894561238565484894561252154215634987498548489456123856848945612385616935462987546854521563498749854848945653412521563498749854848945612521542156349874985484894561238561238754579785616987745651231652213965465341235215634987495215634987498548489456123856169877456512316522139854848774565123165223546298754685465465341235215634987498548354629875468546546534123521563498749854844139496514651231513546298754685465465341235215634987498548435468);

__longmult = false;
__karatsuba = true;

start = clock();
for(int i = 0; i <20; i ++){
a * b;
}
end = clock();
printf(\\\
Took%f seconds \\\
,(double)(end-start)/ CLOCKS_PER_SEC);

__longmult = true;
__karatsuba = false;

start = clock();
for(int i = 0; i <20; i ++){
a * b;
}
end = clock();
printf(\\\
Took%f seconds \\\
,(double)(end-start)/ CLOCKS_PER_SEC);

return 0;
}


解决方案


  1. 您使用std :: vector



    为您的数字,请确保没有不必要的重新分配。所以在操作之前分配空间以避免它。此外,我不使用它,所以我不知道数组范围检查减速。



    检查你是否不移动它! O(N) ...例如插入第一个位置...


  2. 优化您的实施



    这里找到未实现优化的未实施优化的比较

      x = 0.98765588997654321000000009876 ... | 98 * 32位... 
    mul1 [363.472 ms] ... O(N ^ 2)经典乘法
    mul2 [349.384 ms] ... O(3 * ))优化的karatsuba乘法
    mul3 [9345.127 ms] ... O(3 *(N ^ log2(3)))未优化的karatsuba乘法

    矿山执行阈值为Karatsuba约3100位...〜944位数!


    $ b


    尝试从函数操作数中删除不必要的数据 b

      // BigInt运算符+(BigInt a,BigInt b)
    BigInt运算符+(const BigInt& a,const BigInt& b)

    这是你不会再创建 a,b 在堆中每个+调用也更快是这样:

      mul(BigInt& ab,const BigInt& ; a,const BigInt& b)// ab = a * b 


  3. Schönhage-Strassen乘法



    这是 FFT NTT 。我的阈值是大...〜49700bits ...〜15000digits所以如果你不打算使用这么大的数字,然后忘了它。实施也在上面的链接中。




    此处



    如果你使用little或big endian没关系,但你应该以不使用插入操作的方式编码你的操作。




    由于需要使用除法和模运算,因此对于慢的数字使用十进制基数。如果你选择基数作为2的幂,那么只有比特操作足够,并且它从代码中删除了许多if语句,这些都减慢了所有。如果你需要基数作为10的权力,那么使用最大你可以在某些情况下,这减少了div,mod几个减法

      2 ^ 32 = 4 294 967 296 ... int = +/- 2147483648 
    base = 1 000 000 000

    // x%= base
    while(x&碱)x- =碱;

    最大循环数为2 ^ 32 / base或2 ^ 31 /



I have written a BigInteger class in C++ that should be able to do operations on all numbers with any size. Currently I am trying to achieve a very fast multiplication method by comparing the existing algorithms and test for which amount of digits they work best and I ran into very unexpected results.I tried to do 20 multiplications of 500-digit and I timed them. This was the result:

karatsuba:
  14.178 seconds

long multiplication:
  0.879 seconds

Wikipedia told me

It follows that, for sufficiently large n, Karatsuba's algorithm will perform fewer shifts and single-digit additions than longhand multiplication, even though its basic step uses more additions and shifts than the straightforward formula. For small values of n, however, the extra shift and add operations may make it run slower than the longhand method. The point of positive return depends on the computer platform and context. As a rule of thumb, Karatsuba is usually faster when the multiplicands are longer than 320–640 bits.

Since my numbers are at least 1500 bits long this is quite unexpected because wikipedia said karatsuba should run faster. I believe that my problem might be in my addition algorithm but I don't see how I could make it faster because it's already running in O(n). Ill post my code below so that you can check my implementations. I'll leave the irrelevant parts out of it.
I was also thinking that maybe the structure I used was not the best possible. I represented each data segment in little endian. So for example if I have the number 123456789101112 stored into data segments of length 3 it would look like this:

{112,101,789,456,123}

so this is why I am asking now what the best structure and best way to implement a BigInteger class is? And why is the karatsuba algorithm slower than the long multiplication one ?

This is my code: (I'm sorry for the length)

using namespace std;

bool __longmult=true;
bool __karatsuba=false;

struct BigInt {
public:
    vector <int> digits;

    BigInt(const char * number) {
        //constructor is not relevant   
    }
    BigInt() {}

    void BigInt::operator = (BigInt a) {
        digits=a.digits;
    }

    friend BigInt operator + (BigInt,BigInt);
    friend BigInt operator * (BigInt,BigInt);

    friend ostream& operator << (ostream&,BigInt);
};

BigInt operator + (BigInt a,BigInt b) {
    if (b.digits.size()>a.digits.size()) {
        a.digits.swap(b.digits); //make sure a has more or equal amount of digits than b
    }
    int carry=0;

    for (unsigned int i=0;i<a.digits.size();i++) {
        int sum;
        if (i<b.digits.size()) {
            sum=b.digits[i]+a.digits[i]+carry;
        } else if (carry==1) {
            sum=a.digits[i]+carry;
        } else {
            break; // if carry is 0 and no more digits in b are left then we are done already
        }

        if (sum>=1000000000) {
            a.digits[i]=sum-1000000000;
            carry=1;
        } else {
            a.digits[i]=sum;
            carry=0;
        }
    }

    if (carry) {
        a.digits.push_back(1);
    }

    return a;
}

BigInt operator * (BigInt a,BigInt b) {
    if (__longmult) {
        BigInt res;
        for (unsigned int i=0;i<b.digits.size();i++) {
            BigInt temp;
            temp.digits.insert(temp.digits.end(),i,0); //shift to left for i 'digits'

            int carry=0;
            for (unsigned int j=0;j<a.digits.size();j++) {
                long long prod=b.digits[i];
                prod*=a.digits[j];
                prod+=carry;
                int t=prod%1000000000;
                temp.digits.push_back(t);
                carry=(prod-t)/1000000000;
            }
            if (carry>0) {
                temp.digits.push_back(carry);
            }
            res+=temp;
        }
        return res;
    } else if (__karatsuba) {
        BigInt res;
        BigInt a1,a0,b1,b0;
        assert(a.digits.size()>0 && b.digits.size()>0);
        while (a.digits.size()!=b.digits.size()) { //add zeroes for equal size
            if (a.digits.size()>b.digits.size()) {
                b.digits.push_back(0);
            } else {
                a.digits.push_back(0);
            }
        }

        if (a.digits.size()==1) {
            long long prod=a.digits[0];
            prod*=b.digits[0];

            res=prod;//conversion from long long to BigInt runs in constant time
            return res;

        } else {
            for (unsigned int i=0;i<a.digits.size();i++) {
                if (i<(a.digits.size()+(a.digits.size()&1))/2) { //split the number in 2 equal parts
                    a0.digits.push_back(a.digits[i]);
                    b0.digits.push_back(b.digits[i]);
                } else {
                    a1.digits.push_back(a.digits[i]);
                    b1.digits.push_back(b.digits[i]);
                }
            }
        }

        BigInt z2=a1*b1;
        BigInt z0=a0*b0;
        BigInt z1 = (a1 + a0)*(b1 + b0) - z2 - z0;

        if (z2==0 && z1==0) {
            res=z0;
        } else if (z2==0) {
            z1.digits.insert(z1.digits.begin(),a0.digits.size(),0);
            res=z1+z0;
        } else {
            z1.digits.insert(z1.digits.begin(),a0.digits.size(),0);
            z2.digits.insert(z2.digits.begin(),2*a0.digits.size(),0);
            res=z2+z1+z0;
        }

        return res;
    }
}

int main() {
    clock_t start, end;

    BigInt a("984561231354629875468546546534125215534125215634987498548489456125215421563498749854848945612385663498749854848945612521542156349874985484894561238561698774565123165221393856169877456512316552156349874985484894561238561698774565123165221392213935215634987498548489456123856169877456512316522139521563498749854848945612385616987745651231652213949651465123151354686324848945612385616987745651231652213949651465123151354684132319321005482265341252156349874985484894561252154215634987498548489456123856264596162131");
    BigInt b("453412521563498749853412521563498749854848945612521542156349874985484894561238565484894561252154215634987498548489456123856848945612385616935462987546854521563498749854848945653412521563498749854848945612521542156349874985484894561238561238754579785616987745651231652213965465341235215634987495215634987498548489456123856169877456512316522139854848774565123165223546298754685465465341235215634987498548354629875468546546534123521563498749854844139496514651231513546298754685465465341235215634987498548435468");

    __longmult=false;
    __karatsuba=true;

    start=clock();
    for (int i=0;i<20;i++) {
        a*b;
    }
    end=clock();
    printf("\nTook %f seconds\n", (double)(end-start)/CLOCKS_PER_SEC);

    __longmult=true;
    __karatsuba=false;

    start=clock();
    for (int i=0;i<20;i++) {
        a*b;
    }
    end=clock();
    printf("\nTook %f seconds\n", (double)(end-start)/CLOCKS_PER_SEC);

    return 0;
}

解决方案

  1. You use std::vector

    for your digits make sure there are no unnecessary reallocations in it. So allocate space before operation to avoid it. Also I do not use it so I do not know the array range checking slowdowns.

    Check if you do not shift it !!! which is O(N) ... i.e. insert to first position...

  2. Optimize your implementation

    here you can find mine implementation optimized an unoptimized for comparison

    x=0.98765588997654321000000009876... | 98*32 bits...
    mul1[ 363.472 ms ]... O(N^2) classic multiplication
    mul2[ 349.384 ms ]... O(3*(N^log2(3))) optimized karatsuba multiplication
    mul3[ 9345.127 ms]... O(3*(N^log2(3))) unoptimized karatsuba multiplication 
    

    mine implementation threshold for Karatsuba is around 3100bits ... ~ 944 digits!!! The more optimized the code the lover threshold is.


    Try to remove unnecessary data from function operands

    //BigInt operator + (BigInt a,BigInt b)
    BigInt operator + (const BigInt &a,const BigInt &b)
    

    this is way you will not create another copy of a,b on heap in every + call also even faster is this:

    mul(BigInt &ab,const BigInt &a,const BigInt &b) // ab = a*b
    

  3. Schönhage-Strassen multiplication

    this one is FFT or NTT based. Mine threshold for it is big ... ~ 49700bits ... ~ 15000digits so if you do not plan to use such big numbers then forget about it. Implementation is also in the link above.


    here is mine NTT implementation (optimized as much as I could)

  4. Summary

    Does not matter if you use little or big endian but you should code your operations in a way that they do not use insert operations.


    You use decadic base for digits that is slow because you need to use division and modulo operations. If you choose base as power of 2 then just bit operations are enough and also it removes many if statements from code which are slowing all the most. If you need the base as power of 10 then use biggest you can in some cases this reduce the div,mod to few subtractions

    2^32 = 4 294 967 296 ... int = +/- 2147483648
    base = 1 000 000 000
    
    //x%=base
    while (x>=base) x-=base;
    

    max number of cycles is 2^32/base or 2^31/base on some platform is this faster then modulo and also the bigger the base the less operations you need but beware the overflows !!!

这篇关于BigInteger数字实现和性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆