为什么地板()这么慢? [英] Why is floor() so slow?

查看:113
本文介绍了为什么地板()这么慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近(ISO / ANSI C)写了一些code,并且是它取得的糟糕表现感到惊讶。长话短说,原来,罪魁祸首是地板()功能。它不仅是缓慢的,但它并没有向量化(与英特尔的编译器,又名ICL)。

下面是一些基准在2D矩阵的所有单元格进行地板:

  VC:0.10
ICL:0.20

与此相比,一个简单的转换:

  VC:0.04
ICL:0.04

哪有地板()是比简单的铸件慢得多?它本质上是做(除了为负数)同样的事情。
第二个问题:是否有人知道一个超高速地板()实施

PS:这里是我的标杆循环:

 无效楼(浮动*马塔,为int * INTA,const int的高度,const int的宽度,const int的width_aligned)
{
    浮动*行rowA = NULL;
    为int * intRowA = NULL;
    INT行,列;    为(行= 0;&行LT;高度; ++行){
        行rowA =马塔+行* width_aligned;
        intRowA = INTA +行* width_aligned;
的#pragma ivdep
        对于(COL = 0;&山坳LT;宽度; ++ COL){
            / * intRowA [COL] =地板(行rowA [COL]); * /
            intRowA [COL] =(INT)(行rowA [COL]);
        }
    }
}


解决方案

一对夫妇的事情使地板比演员和​​prevent矢量慢。

其中最重要的:

地板可以修改全局状态。如果传递的值是巨大的重新psented在浮点格式的整数$ P $,在错误号变量被设置为益登。 NaN的特殊处理完成为好。所有这些行为是希望以某种方式检测溢出情况及处理情况的应用程序(不要问我怎么样)。

检测这些有问题的条件并不简单并占90%以上的楼层的excution时间。实际四舍五入便宜,会内联/矢量。另外这是一个很大code,所以内联整层功能将使你的程序运行速度较慢。

一些编译器有专门的编译器标志,让编译器优化掉一些很少使用的C-标准规则。例如 GCC 可以告诉你不感兴趣的错误号都没有。要做到这一点通过的 -fno-数学错误号 -ffast-数学。 ICC和VC可能有类似的编译器标志。

顺便说一句 - 您可以使用简单的铸件滚你自己的楼层功能。你只需要以不同方式处理阴性和阳性病例。这可能是快了很多,如果你不需要溢出和NaN的特殊处理。

I wrote some code recently (ISO/ANSI C), and was surprised at the poor performance it achieved. Long story short, it turned out that the culprit was the floor() function. Not only it was slow, but it did not vectorize (with Intel compiler, aka ICL).

Here are some benchmarks for performing floor for all cells in a 2D matrix:

VC:  0.10
ICL: 0.20

Compare that to a simple cast:

VC:  0.04
ICL: 0.04

How can floor() be that much slower than a simple cast?! It does essentially the same thing (apart for negative numbers). 2nd question: Does someone know of a super-fast floor() implementation?

PS: Here is the loop that I was benchmarking:

void Floor(float *matA, int *intA, const int height, const int width, const int width_aligned)
{
    float *rowA=NULL;
    int   *intRowA=NULL;
    int   row, col;

    for(row=0 ; row<height ; ++row){
        rowA = matA + row*width_aligned;
        intRowA = intA + row*width_aligned;
#pragma ivdep
        for(col=0 ; col<width; ++col){
            /*intRowA[col] = floor(rowA[col]);*/
            intRowA[col] = (int)(rowA[col]);
        }
    }
}

解决方案

A couple of things make floor slower than a cast and prevent vectorization.

The most important one:

floor can modify the global state. If you pass a value that is to huge to be represented as an integer in float format, the errno variable gets set to EDOM. Special handling for NaNs is done as well. All this behaviour is for applications that want to detect the overflow case and handle the situation somehow (don't ask me how).

Detecting these problematic conditions is not simple and makes up more than 90% of the excution time of floor. The actual rounding is cheap and could be inlined/vectorized. Also It's a lot of code, so inlining the whole floor-function would make your program run slower.

Some compilers have special compiler flags that allow the compiler to optimize away some of the rarely used c-standard rules. For example GCC can be told that you're not interested in errno at all. To do so pass -fno-math-errno or -ffast-math. ICC and VC may have similar compiler flags.

Btw - You can roll your own floor-function using simple casts. You just have to handle the negative and positive cases differently. That may be a lot faster if you don't need the special handling of overflows and NaNs.

这篇关于为什么地板()这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆