为什么使用带有 int64_t 操作数的 mod 会使这个函数慢 150%? [英] Why does using mod with int64_t operand makes this function 150% slower?

查看:34
本文介绍了为什么使用带有 int64_t 操作数的 mod 会使这个函数慢 150%?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

max_rem 函数计算 (a+1)^n + (a-1)^n 除以 留下的最大余数n = 1, 2, 3... 的代码>.main 对从 3999 的每个 a 调用 max_rem.完整代码:

#include #include int max_rem(int a) {int max_r = 0;int m = a * a;//<-------- 违规行int r1 = a+1, r2 = a-1;for(int n = 1; n <= a*a; n++) {r1 = (r1 * (a + 1)) % m;r2 = (r2 * (a - 1)) % m;int r = (r1 + r2) % m;如果(max_r 

如果我将第 6 行更改为:

int m = a * a;

int64_t m = a * a;

整个计算速度大约慢了 150%.我尝试了 gcc 5.3clang 3.6.

使用int:

$ gcc -std=c99 -O3 -Wall -o 120 120.c$时间(./120)真正的 0m3.823s用户 0m3.816s系统 0m0.000s

使用 int64_t:

$ time(./120)真正的 0m9.861s用户 0m9.836s系统 0m0.000s

是的,我使用的是 64 位系统.为什么会发生这种情况?

我一直认为使用 int64_t 更安全、更便携,并且是编写 C 的现代方法"®,并且不会损害 64 位系统上数字代码的性能.这个假设是错误的吗?

编辑:要明确一点:即使您将每个变量更改为int64_t,减速仍然存在.所以这不是混合 intint64_t 的问题.

解决方案

我一直认为使用 int64_t 更安全、更便携,并且是编写 C 的现代方法"®,并且不会损害 64 位系统上数字代码的性能.这个假设是错误的吗?

在我看来是这样.您可以在 英特尔的软件优化参考手册(附录 C,第 645 页上的表 C-17 通用说明):

<前>IDIV r64 吞吐量 每条指令 85-100 个周期IDIV r32 吞吐量 每条指令 20-26 个周期

The max_rem function computes the maximum remainder that (a+1)^n + (a-1)^n leaves when divided by for n = 1, 2, 3.... The main calls max_rem on every a from 3 to 999. Complete code:

#include <inttypes.h>
#include <stdio.h>

int max_rem(int a) {
    int max_r = 0;
    int m = a * a; // <-------- offending line
    int r1 = a+1, r2 = a-1;
    for(int n = 1; n <= a*a; n++) {
        r1 = (r1 * (a + 1)) % m;
        r2 = (r2 * (a - 1)) % m;
        int r = (r1 + r2) % m;
        if(max_r < r) 
            max_r = r;
    }
    return max_r;
}

int main() {
    int64_t sum = 0;
    for(int a = 3; a < 1000; a++)
        sum += max_rem(a);

    printf("%ld\n", sum);
}

If I change line 6 from:

int m = a * a;

to

int64_t m = a * a;

the whole computation becames about 150% slower. I tried both with gcc 5.3 and clang 3.6.

With int:

$ gcc -std=c99 -O3 -Wall -o 120 120.c
$ time(./120)

real    0m3.823s
user    0m3.816s
sys     0m0.000s

with int64_t:

$ time(./120)

real    0m9.861s
user    0m9.836s
sys     0m0.000s

and yes, I'm on a 64-bit system. Why does this happen?

I've always assumed that using int64_t is safer and more portable and "the modern way to write C"® and wouldn't harm performances on 64bits systems for numeric code. Is this assumption erroneous?

EDIT: just to be clear: the slowdown persists even if you change every variable to int64_t. So this is not a problem with mixing int and int64_t.

解决方案

I've always assumed that using int64_t is safer and more portable and "the modern way to write C"® and wouldn't harm performances on 64bits systems for numeric code. Is this assumption erroneous?

It seems so to me. You can find the instruction timings in Intel's Software Optimization Reference manual (appendix C, table C-17 General Purpose Instructions on page 645):

    IDIV r64   Throughput 85-100 cycles per instruction
    IDIV r32   Throughput 20-26 cycles per instruction

这篇关于为什么使用带有 int64_t 操作数的 mod 会使这个函数慢 150%?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆