浮点数学执行时间 [英] Floating Point Math Execution Time

查看:170
本文介绍了浮点数学执行时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么占了第一个数据集的添加执行时间?装配说明相同。



DN_FLUSH标志不为ON时,第一个数据集需要63毫秒,第二个数据集需要15毫秒。

DN_FLUSH标志置位时,数据集需要15毫秒,第二个集需要〜0毫秒。



因此,在这两种情况下,第一个数据集的执行时间要大得多。

有没有办法减少执行时间,使其更接近第二个数据集?



我使用C ++ Visual Studio 2005,/ arch:SSE2 / fp:在Intel Core 2 Duo T7700 @ 2.4Ghz Windows XP Pro上快速运行。

  #define NUMLOOPS 1000000 

// ALPHA和x86 $硬件将正常值清除为零b $ b //具有SSE2支持的处理器。在其他x86平台上忽略
//设置此操作会将执行时间从63毫秒减少到16毫秒
// _controlfp(_DN_FLUSH,_MCW_DN);

float denormal = 1.0e-38;
float denormalTwo = 1.0e-39;
float denormalThree = 1;

tickStart = GetTickCount();

//运行第一个计算循环
for(loops = 0; loops< NUMLOOPS; loops ++)
{
denormalThree = denormal-denormalTwo;
}

//获取执行时间
duration = GetTickCount() - tickStart;
printf(Duration =%dms\\\
,duration);

float normal = 1.0e-10;
float normalTwo = 1.0e-2;
float normalThree = 1;

tickStart = GetTickCount();

//运行第二个计算循环
for(loops = 0; loops< NUMLOOPS; loops ++)
{
normalThree = normal - normalTwo;
}

//获取执行时间
duration = GetTickCount() - tickStart;
printf(Duration =%dms\\\
,duration);


解决方案

引用英特尔优化手册:


当SIMD
浮点指令的输入操作数[这里包括使用SSE完成的标量算术]包含
值小于数据类型的
可表示范围,则会发生
a异常异常。这
导致显着的性能
惩罚。 SIMD浮点
操作在
中具有flush-to-zero模式,结果不会下溢。
因此,后续计算将
不面对处理反常输入操作数的
的性能损失。


至于如何避免这种情况,如果你不能刷新denormals:做你可以确保你的数据被适当地缩放,你不会遇到denormals首先。



或者,您也可以在 double ,它有一个更大的指数范围,因此,你不太可能会遇到反正变量在首位。


What accounts for the added execution time of the first data set? The assembly instructions are the same.

With DN_FLUSH flag not on, the first data set takes 63 milliseconds, the second set takes 15 milliseconds.
With DN_FLUSH flag on, the first data set takes 15 milliseconds, the second set takes ~0 milliseconds.

Therefore, in both cases the execution time of the first data set is much greater.

Is there any way to decrease the execution time to be closer in line with the second data set?

I am using C++ Visual Studio 2005, /arch:SSE2 /fp:fast running on Intel Core 2 Duo T7700 @ 2.4Ghz Windows XP Pro.

#define NUMLOOPS 1000000

// Denormal values flushed to zero by hardware on ALPHA and x86
// processors with SSE2 support. Ignored on other x86 platforms
// Setting this decreases execution time from 63 milliseconds to 16 millisecond
// _controlfp(_DN_FLUSH, _MCW_DN);

float denormal = 1.0e-38;
float denormalTwo = 1.0e-39;
float denormalThree = 1;

tickStart = GetTickCount();

// Run First Calculation Loop 
for (loops=0; loops < NUMLOOPS; loops++)
{
    denormalThree = denormal - denormalTwo;
}

// Get execution time
duration = GetTickCount()-tickStart;
printf("Duration = %dms\n", duration);

float normal = 1.0e-10;
float normalTwo = 1.0e-2;
float normalThree = 1;

tickStart = GetTickCount();

// Run Second Calculation Loop 
for (loops=0; loops < NUMLOOPS; loops++)
{
    normalThree = normal - normalTwo;
}

// Get execution time
duration = GetTickCount()-tickStart;
printf("Duration = %dms\n", duration);

解决方案

Quoting from Intel's optimization manual:

When an input operand for a SIMD floating-point instruction [here this includes scalar arithmetic done using SSE] contains values that are less than the representable range of the data type, a denormal exception occurs. This causes a significant performance penalty. An SIMD floating-point operation has a flush-to-zero mode in which the results will not underflow. Therefore subsequent computation will not face the performance penalty of handling denormal input operands.

As for how to avoid this, if you can't flush denormals: do what you can to make sure your data is scaled appropriately and you don't encounter denormals in the first place. Usually this means delaying applying some scale factor until you've finished all of your other computation.

Alternatively, do your computations in double which has a much larger exponent range, and therefore makes it much less likely that you will encounter denormals in the first place.

这篇关于浮点数学执行时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆