64位模运算奇怪的表现行为 [英] Strange performance behaviour for 64 bit modulo operation

查看:167
本文介绍了64位模运算奇怪的表现行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最后这三种方法调用需要约。双倍的时间比第4位。

The last three of these method calls take approx. double the time than the first four.

唯一不同的是,他们的论据不为整数适合了。但要这件事情?该参数被声明为长,所以应该用一个较长的计算反正。是否模操作使用另一种算法对数字> MAXINT?

The only difference is that their arguments doesn't fit in integer anymore. But should this matter? The parameter is declared to be long, so it should use long for calculation anyway. Does the modulo operation use another algorithm for numbers>maxint?

我使用AMD的Athlon64 3200+,WINXP SP3和VS2008。

I am using amd athlon64 3200+, winxp sp3 and vs2008.

       Stopwatch sw = new Stopwatch();
       TestLong(sw, int.MaxValue - 3l);
       TestLong(sw, int.MaxValue - 2l);
       TestLong(sw, int.MaxValue - 1l);
       TestLong(sw, int.MaxValue);
       TestLong(sw, int.MaxValue + 1l);
       TestLong(sw, int.MaxValue + 2l);
       TestLong(sw, int.MaxValue + 3l);
       Console.ReadLine();

    static void TestLong(Stopwatch sw, long num)
    {
        long n = 0;
        sw.Reset();
        sw.Start();
        for (long i = 3; i < 20000000; i++)
        {
            n += num % i;
        }
        sw.Stop();
        Console.WriteLine(sw.Elapsed);            
    }

编辑: 我现在试图在同一C和问题确实的的发生在这里,所有的模操作需要的同时,在释放和在调试模式下有和没有优化打开时:

I now tried the same with C and the issue does not occur here, all modulo operations take the same time, in release and in debug mode with and without optimizations turned on:

#include "stdafx.h"
#include "time.h"
#include "limits.h"

static void TestLong(long long num)
{
    long long n = 0;

    clock_t t = clock();
    for (long long i = 3; i < 20000000LL*100; i++)
    {
        n += num % i;
    }

    printf("%d - %lld\n", clock()-t, n);  
}

int main()
{
    printf("%i %i %i %i\n\n", sizeof (int), sizeof(long), sizeof(long long), sizeof(void*));

    TestLong(3);
    TestLong(10);
    TestLong(131);
    TestLong(INT_MAX - 1L);
    TestLong(UINT_MAX +1LL);
    TestLong(INT_MAX + 1LL);
    TestLong(LLONG_MAX-1LL);

    getchar();
    return 0;
}

EDIT2:

感谢伟大的建议。我发现,无论是.NET和C(在调试,以及在释放模式)开不不原子使用的CPU指令来计算剩余,但他们调用函数做。

Thanks for the great suggestions. I found that both .net and c (in debug as well as in release mode) does't not use atomically cpu instructions to calculate the remainder but they call a function that does.

在C程序中,我能得到它,这是_allrem之名。这也显示该文件完整的源代码注释,所以我查到的资料,该算法特殊情况下,32位除数,而不是分红这是在.NET应用程序的情况下。

In the c program I could get the name of it which is "_allrem". It also displayed full source comments for this file so I found the information that this algorithm special cases the 32bit divisors instead of dividends which was the case in the .net application.

我还发现,C程序的性能真的只受除数的价值,但不分红。另一个测试表明,在达网络节目的剩余功能的性能取决于两个被除数和除数

I also found out that the performance of the c program really is only affected by the value of the divisor but not the dividend. Another test showed that the performance of the remainder function in the .net program depends on both the dividend and divisor.

BTW:即使是简单的很长很长的值增加由连续的加和ADC指令计算。所以,即使我的处理器自称是64位,这真不是:(

BTW: Even simple additions of long long values are calculated by a consecutive add and adc instructions. So even if my processor calls itself 64bit, it really isn't :(

EDIT3:

我现在跑的C应用程序在Windows 7 x64版本,使用Visual Studio 2010有趣的是,业绩的行为保持不变编制,虽然现在(我检查了汇编源)真正的64位指令所取代。

I now ran the c app on a windows 7 x64 edition, compiled with visual studio 2010. The funny thing is, the performance behavior stays the same, although now (I checked the assembly source) true 64 bit instructions are used.

推荐答案

多么奇怪的观察。这里的东西,你可以做进一步调查:在节目的开头添加一个暂停,像到Console.ReadLine,但在第一次调用后,你的方法。然后建立在释放模式运行程序。然后启动该程序的不是在调试器的。然后,在暂停,附加调试。调试通过,并看看在code即时编译有问题的方法。它应该是pretty的容易找到的循环体。

What a curious observation. Here's something you can do to investigate this further: add a "pause" at the beginning of the program, like a Console.ReadLine, but AFTER the first call to your method. Then build the program in "release" mode. Then start the program not in the debugger. Then, at the pause, attach the debugger. Debug through it and take a look at the code jitted for the method in question. It should be pretty easy to find the loop body.

这将是有趣的,知道是怎么生成的循环体不同之处在于你的C程序。

It would be interesting to know how the generated loop body differs from that in your C program.

究其原因,所有这些箍跳通过,因为抖动的变化是什么code这jitting调试装配时产生的或jitting已经有附加调试程序时,的;它JIT们code,它更容易在这种情况下调试器来了解。这将是更有趣的,看看有什么抖动认为是最好的code为这种情况下产生的,所以你要附加调试后期,抖动运行后。

The reason for all those hoops to jump through is because the jitter changes what code it generates when jitting a "debug" assembly or when jitting a program that already has a debugger attached; it jits code that is easier to understand in a debugger in those cases. It would be more interesting to see what the jitter thinks is the "best" code generated for this case, so you have to attach the debugger late, after the jitter has run.

这篇关于64位模运算奇怪的表现行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆