浮点与现代硬件上的整数计算 [英] Floating point vs integer calculations on modern hardware

查看:157
本文介绍了浮点与现代硬件上的整数计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在C ++做一些性能关键的工作,我们目前使用整数计算本质上浮点的问题,因为其更快。这会导致很多恼人的问题,并增加了很多恼人的代码。

I am doing some performance critical work in C++, and we are currently using integer calculations for problems that are inherently floating point because "its faster". This causes a whole lot of annoying problems and adds a lot of annoying code.

现在,我记得读关于浮点计算是如此缓慢大约386天,在那里我相信(IIRC)有一个可选的协处理器。但是现在确实有指数级更复杂和强大的CPU,如果做浮点或整数计算,它在速度没有什么区别?特别是因为实际的计算时间比类似导致流水线停止或从主内存中取出东西的东西很小?

Now, I remember reading about how floating point calculations were so slow approximately circa the 386 days, where I believe (IIRC) that there was an optional co-proccessor. But surely nowadays with exponentially more complex and powerful CPUs it makes no difference in "speed" if doing floating point or integer calculation? Especially since the actual calculation time is tiny compared to something like causing a pipeline stall or fetching something from main memory?

我知道正确的答案是在目标硬件上进行基准测试,什么是一个好的方法来测试这个?我写了两个微小的C ++程序,并在Linux上将它们的运行时间与时间进行比较,但实际运行时间太可变(不能帮助我在虚拟服务器上运行)。没有花我的整天运行数百个基准,制作图表等有什么我可以做的,以获得一个合理的测试的相对速度?任何想法或想法?我完全错了?

I know the correct answer is to benchmark on the target hardware, what would be a good way to test this? I wrote two tiny C++ programs and compared their run time with "time" on Linux, but the actual run time is too variable (doesn't help I am running on a virtual server). Short of spending my entire day running hundreds of benchmarks, making graphs etc. is there something I can do to get a reasonable test of the relative speed? Any ideas or thoughts? Am I completely wrong?

我使用的程序如下,它们不是以任何方式相同:

The programs I used as follows, they are not identical by any means:

#include <iostream>
#include <cmath>
#include <cstdlib>
#include <time.h>

int main( int argc, char** argv )
{
    int accum = 0;

    srand( time( NULL ) );

    for( unsigned int i = 0; i < 100000000; ++i )
    {
        accum += rand( ) % 365;
    }
    std::cout << accum << std::endl;

    return 0;
}

计划2:

#include <iostream>
#include <cmath>
#include <cstdlib>
#include <time.h>

int main( int argc, char** argv )
{

    float accum = 0;
    srand( time( NULL ) );

    for( unsigned int i = 0; i < 100000000; ++i )
    {
        accum += (float)( rand( ) % 365 );
    }
    std::cout << accum << std::endl;

    return 0;
}

提前感谢!

编辑:我关心的平台是在桌面Linux和Windows机器上运行的常规x86或x86-64。

The platform I care about is regular x86 or x86-64 running on desktop Linux and Windows machines.

编辑2(从下面的评论粘贴)我们目前有一个广泛的代码库。真的,我反对的概括,我们不能使用浮点,因为整数计算更快 - 我正在寻找一种方式(如果这是真的)反驳这一广义假设。我意识到,对于我们不能完成所有的工作和对其进行分析后,我们无法预测确切的结果。

Edit 2(pasted from a comment below): We have an extensive code base currently. Really I have come up against the generalization that we "must not use float since integer calculation is faster" - and I am looking for a way (if this is even true) to disprove this generalized assumption. I realize that it would be impossible to predict the exact outcome for us short of doing all the work and profiling it afterwards.

无论如何,谢谢你所有的优秀答案和帮助。请随意添加其他内容:)

Anyway, thanks for all your excellent answers and help. Feel free to add anything else :).

推荐答案

唉,我只能给你一个

从我的经验来看,有许多,许多变量的性能,特别是整数&浮点数学。由于不同的处理器具有不同的流水线长度,它在处理器之间变化很大(甚至在同一个系列中,例如x86)。此外,一些操作通常非常简单(例如添加),并且具有通过处理器的加速路径,而其他操作(例如除法)花费更多时间。

From my experience, there are many, many variables to performance...especially between integer & floating point math. It varies strongly from processor to processor (even within the same family such as x86) because different processors have different "pipeline" lengths. Also, some operations are generally very simple (such as addition) and have an accelerated route through the processor, and others (such as division) take much, much longer.

另一个大变量是数据驻留的位置。如果只有几个值要添加,则所有数据都可以驻留在缓存中,在那里可以快速发送到CPU。一个非常非常慢的浮点操作,已经在缓存中的数据将是一个整数操作的整数需要从系统内存复制的速度的许多倍。

The other big variable is where the data reside. If you only have a few values to add, then all of the data can reside in cache, where they can be quickly sent to the CPU. A very, very slow floating point operation that already has the data in cache will be many times faster than an integer operation where an integer needs to be copied from system memory.

我假设你提出这个问题,因为你在一个性能关键的应用程序。如果您正在为x86架构开发,并且需要额外的性能,那么您可能需要调查使用SSE扩展。这可以极大地加速单精度浮点运算,因为可以一次对多个数据执行相同的操作,并且对于SSE操作有一个单独的寄存器组。 (我注意到在你的第二个例子,你使用浮动而不是双,使我认为你使用单精度数学)。

I assume that you are asking this question because you are working on a performance critical application. If you are developing for the x86 architecture, and you need extra performance, you might want to look into using the SSE extensions. This can greatly speed up single-precision floating point arithmetic, as the same operation can be performed on multiple data at once, plus there is a separate* bank of registers for the SSE operations. (I noticed in your second example you used "float" instead of "double", making me think you are using single-precision math).

*注意:使用旧的MMX指令实际上会减慢程序,因为那些旧的指令实际上使用与FPU相同的寄存器,使得不可能使用FPU和MMX。

*Note: Using the old MMX instructions would actually slow down programs, because those old instructions actually used the same registers as the FPU does, making it impossible to use both the FPU and MMX at the same time.

这篇关于浮点与现代硬件上的整数计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆