用浮漂不一致的乘法性能 [英] Inconsistent multiplication performance with floats

查看:132
本文介绍了用浮漂不一致的乘法性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在测试中的.NET彩车的表现,我对一个奇怪的情况下无意中发现:某些值,乘似乎比正常的方法要慢。下面是测试用例:

While testing the performance of floats in .NET, I stumbled unto a weird case: for certain values, multiplication seems way slower than normal. Here is the test case:

using System;
using System.Diagnostics;

namespace NumericPerfTestCSharp {
    class Program {
        static void Main() {
            Benchmark(() => float32Multiply(0.1f), "\nfloat32Multiply(0.1f)");
            Benchmark(() => float32Multiply(0.9f), "\nfloat32Multiply(0.9f)");
            Benchmark(() => float32Multiply(0.99f), "\nfloat32Multiply(0.99f)");
            Benchmark(() => float32Multiply(0.999f), "\nfloat32Multiply(0.999f)");
            Benchmark(() => float32Multiply(1f), "\nfloat32Multiply(1f)");
        }

        static void float32Multiply(float param) {
            float n = 1000f;
            for (int i = 0; i < 1000000; ++i) {
                n = n * param;
            }
            // Write result to prevent the compiler from optimizing the entire method away
            Console.Write(n);
        }

        static void Benchmark(Action func, string message) {
            // warm-up call
            func();

            var sw = Stopwatch.StartNew();
            for (int i = 0; i < 5; ++i) {
                func();
            }
            Console.WriteLine(message + " : {0} ms", sw.ElapsedMilliseconds);
        }
    }
}

结果:

float32Multiply(0.1f) : 7 ms
float32Multiply(0.9f) : 946 ms
float32Multiply(0.99f) : 8 ms
float32Multiply(0.999f) : 7 ms
float32Multiply(1f) : 7 ms

为什么是结果参数= 0.9F?

Why are the results so different for param = 0.9f?

测试参数:.NET 4.5,发布版本,code优化ON,86,没有调试器附着

Test parameters: .NET 4.5, Release build, code optimizations ON, x86, no debugger attached.

推荐答案

正如其他人所说,不同的处理器不支持正常速度计算,涉及低于正常的浮点值时。这是任何一个设计上的缺陷(如果该行为损害了您的应用程序或以其他方式麻烦)或功能(如果你preFER更便宜的处理器或其他用途的硅这是不使用门对这项工作已启用)。

As others have mentioned, various processors do not support normal-speed calculations when subnormal floating-point values are involved. This is either a design defect (if the behavior impairs your application or is otherwise troublesome) or a feature (if you prefer the cheaper processor or alternative use of silicon that was enabled by not using gates for this work).

据启发理解为什么有一个过渡.5:

It is illuminating to understand why there is a transition at .5:

假设你用 P 成倍增加。最终,值变得如此之小,结果是一些低于正常值(低于2 -126 在32位IEEE二进制浮点)。然后乘变慢。当你继续相乘,将值继续减小,并达到2 -149 ,它是一种可以重新presented的最小正数。现在,当您乘 P ,确切的结果当然是2 -149 P ,这是一个介于0和2 - 149 ,它们是两个最接近的重新presentable值。机器必须四舍五入的结果,并返回这两个值中的一个。

Suppose you are multiplying by p. Eventually, the value becomes so small that the result is some subnormal value (below 2-126 in 32-bit IEEE binary floating point). Then multiplication becomes slow. As you continue multiplying, the value continues decreasing, and it reaches 2-149, which is the smallest positive number that can be represented. Now, when you multiply by p, the exact result is of course 2-149p, which is between 0 and 2-149, which are the two nearest representable values. The machine must round the result and return one of these two values.

哪一个?如果 P 小于半,然后2 -149 P 越接近0比2 -149 ,所以本机返回0。然后,你是不是正与低于正常价值了,和乘法是快了。如果 P 大于半,然后2 -149 P 接近2 -149 比0,所以本机返回2 -149 ,和您继续低于正常价值的工作,和乘法依然缓慢。如果 P 是完全半,四舍五入规则说要使用具有零的有效位数(小数部分),这是零(2 -149 的低位值在低位有1)。

Which one? If p is less than ½, then 2-149p is closer to 0 than to 2-149, so the machine returns 0. Then you are not working with subnormal values anymore, and multiplication is fast again. If p is greater than ½, then 2-149p is closer to 2-149 than to 0, so the machine returns 2-149, and you continue working with subnormal values, and multiplication remains slow. If p is exactly ½, the rounding rules say to use the value that has zero in the low bit of its significand (fraction portion), which is zero (2-149 has 1 in its low bit).

您报告.99f出现快速。这应该结束与缓慢的行为。也许你贴code是不完全对您测速性能.99f的code?也许初始值或迭代次数发生了变化?

You report that .99f appears fast. This should end with the slow behavior. Perhaps the code you posted is not exactly the code for which you measured fast performance with .99f? Perhaps the starting value or the number of iterations were changed?

有方法可以解决这个问题。之一是,硬件具有模式设置指定改变使用或获得零任何低于正常值,被称为非规格化作为零或齐平零模式。我不使用.NET,不能为您提供关于如何在.NET中设置这些模式。

There are ways to work around this problem. One is that the hardware has mode settings that specify to change any subnormal values used or obtained to zero, called "denormals as zero" or "flush to zero" modes. I do not use .NET and cannot advise you about how to set these modes in .NET.

另一种方法是,每次添加一个微小的值,例如

Another approach is to add a tiny value each time, such as

n = (n+e) * param;

其中,电子至少2 -126 / 参数。需要注意的是2 -126 / 参数应计算向上舍入,除非你能保证 N 是足够大,(N + E)*参数不产生低于正常数值。这也presumes N 不为负。这样做的效果是确保所计算的值总是大到足以在正常范围内,从未低于正常

where e is at least 2-126/param. Note that 2-126/param should be calculated rounded upward, unless you can guarantee that n is large enough that (n+e) * param does not produce a subnormal value. This also presumes n is not negative. The effect of this is to make sure the calculated value is always large enough to be in the normal range, never subnormal.

在这种方式当然是添加电子更改结果。但是,如果你是,例如,具有一定的回波效应(或其他滤)加工时的音频,那么电子的值太小造成任何影响观察到人类听到音频。它很可能太小产生音频时以引起硬件行为的任何变化。

Adding e in this way of course changes the results. However, if you are, for example, processing audio with some echo effect (or other filter), then the value of e is too small to cause any effects observable by humans listening to the audio. It is likely too small to cause any change in the hardware behavior when producing the audio.

这篇关于用浮漂不一致的乘法性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆