整数乘法是否真的以与现代CPU上加法相同的速度完成? [英] Is integer multiplication really done at the same speed as addition on a modern CPU?

查看:137
本文介绍了整数乘法是否真的以与现代CPU上加法相同的速度完成?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常听到这种说法,即现代硬件上的乘法如此优化,以至于实际上与加法的速度相同.是真的吗?

我永远无法获得任何权威性的确认.我自己的研究只增加了问题.速度测试通常显示的数据使我感到困惑.这是一个示例:

#include <stdio.h>
#include <sys/time.h>

unsigned int time1000() {
    timeval val;
    gettimeofday(&val, 0);
    val.tv_sec &= 0xffff;
    return val.tv_sec * 1000 + val.tv_usec / 1000;
}

int main() {
    unsigned int sum = 1, T = time1000();
    for (int i = 1; i < 100000000; i++) {
        sum += i + (i+1); sum++;
    }
    printf("%u %u\n", time1000() - T, sum);
    sum = 1;
    T = time1000();
    for (int i = 1; i < 100000000; i++) {
        sum += i * (i+1); sum++;
    }
    printf("%u %u\n", time1000() - T, sum);
}

上面的代码可以显示乘法更快:

clang++ benchmark.cpp -o benchmark
./benchmark
746 1974919423
708 3830355456

但是对于其他编译器,其他编译器参数,不同编写的内部循环,结果可能会有所不同,我什至无法得出近似值.

解决方案

实际上可以在O(log n)电路深度中乘以两个 n 位数字 >,就像加法一样.

O(log n)中的加法是将数字分成两半,然后(递归)将两个部分相加在 parallel 中,其中上半部分都为 "0进位"和"1进位"情况.添加下半部分后,将检查进位,并使用其值在0进位和1进位的情况之间进行选择.

O(log n)深度的乘积也通过 parallelization 进行 ,其中,每个3个数字的和减少为并行的2个数字的和,并且总和是通过类似上述方式完成的.
我在这里不做解释,但是您可以通过查找"cry-lookahead" "carry-save" 加法来找到有关快速加法和乘法的阅读材料. /p>

因此,从理论的角度来看,由于电路显然是固有并行的(与软件不同),所以乘法渐近变慢的唯一原因是前端的常数而不是渐进复杂度.

I hear this statement quite often, that multiplication on modern hardware is so optimized that it actually is at the same speed as addition. Is that true?

I never can get any authoritative confirmation. My own research only adds questions. The speed tests usually show data that confuses me. Here is an example:

#include <stdio.h>
#include <sys/time.h>

unsigned int time1000() {
    timeval val;
    gettimeofday(&val, 0);
    val.tv_sec &= 0xffff;
    return val.tv_sec * 1000 + val.tv_usec / 1000;
}

int main() {
    unsigned int sum = 1, T = time1000();
    for (int i = 1; i < 100000000; i++) {
        sum += i + (i+1); sum++;
    }
    printf("%u %u\n", time1000() - T, sum);
    sum = 1;
    T = time1000();
    for (int i = 1; i < 100000000; i++) {
        sum += i * (i+1); sum++;
    }
    printf("%u %u\n", time1000() - T, sum);
}

The code above can show that multiplication is faster:

clang++ benchmark.cpp -o benchmark
./benchmark
746 1974919423
708 3830355456

But with other compilers, other compiler arguments, differently written inner loops, the results can vary and I cannot even get an approximation.

解决方案

Multiplication of two n-bit numbers can in fact be done in O(log n) circuit depth, just like addition.

Addition in O(log n) is done by splitting the number in half and (recursively) adding the two parts in parallel, where the upper half is solved for both the "0-carry" and "1-carry" case. Once the lower half is added, the carry is examined, and its value is used to choose between the 0-carry and 1-carry case.

Multiplication in O(log n) depth is also done through parallelization, where every sum of 3 numbers is reduced to a sum of just 2 numbers in parallel, and the sums are done in some manner like the above.
I won't explain it here, but you can find reading material on fast addition and multiplication by looking up "carry-lookahead" and "carry-save" addition.

So from a theoretical standpoint, since circuits are obviously inherently parallel (unlike software), the only reason multiplication would be asymptotically slower is the constant factor in the front, not the asymptotic complexity.

这篇关于整数乘法是否真的以与现代CPU上加法相同的速度完成?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆