当截断浮点舍入 [英] Floating point rounding when truncating

查看:141
本文介绍了当截断浮点舍入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这可能是一个x86 FPU专家一个问题:

This is probably a question for an x86 FPU expert:

我试图写它产生的范围是[MIN,MAX]随机浮点值的函数。问题是,我的发电机算法(浮点梅森倍捻机,如果你好奇)只在区间[1,2)返回值 - 即,我想一个包容性的上限,但我的源产生的价值从独家上限。这里的渔获,底层发电机返回一个8字节双,但我只想要一个4字节的花车,和我使用的最近的默认FPU舍入模式。

I am trying to write a function which generates a random floating point value in the range [min,max]. The problem is that my generator algorithm (the floating-point Mersenne Twister, if you're curious) only returns values in the range [1,2) - ie, I want an inclusive upper bound, but my "source" generated value is from an exclusive upper bound. The catch here is that the underlying generator returns an 8-byte double, but I only want a 4-byte float, and I am using the default FPU rounding mode of Nearest.

我想知道的是截断本身在这种情况下,是否会导致我的返回值是包容性最大的时候FPU内部80位值十分接近,或者我是否应该之前递增我的最大值的尾数由中介随机在[1,2相乘),或者我是否应该改变FPU模式。或任何其他的想法,当然。

What I want to know is whether the truncation itself in this case will result in my return value being inclusive of max when the FPU internal 80-bit value is sufficiently close, or whether I should increment the significand of my max value before multiplying it by the intermediary random in [1,2), or whether I should change FPU modes. Or any other ideas, of course.

下面是我目前使用code,和我做验证1.0F解析为0x3f800000:

Here's the code I am currently using, and I did verify that 1.0f resolves to 0x3f800000:

float MersenneFloat( float min, float max )
{
    //genrand returns a double in [1,2)
    const float random = (float)genrand_close1_open2(); 
    //return in desired range
    return min + ( random - 1.0f ) * (max - min);
}

如果它的确与众不同,这需要两个Win32的MSVC ++和Linux GCC工作。此外,将使用SSE优化的任何版本改变这个问题的答案?

If it makes a difference, this needs to work on both Win32 MSVC++ and Linux gcc. Also, will using any versions of the SSE optimizations change the answer to this?

编辑:答案是肯定的,截断在这种情况下,从双浮动足以导致的结果是包容性最大的。见Crashworks'答案更多。

The answer is yes, truncation in this case from double to float is sufficient to cause the result to be inclusive of max. See Crashworks' answer for more.

推荐答案

上证所老年退休金计划会微妙地改变这种算法的行为,因为他们没有中间的80位重presentation - 数学确实是在32或64位完成。好消息是,你可以很容易地测试它,看看它是否通过简单地指定/ ARCH改变你的结果:SSE2命令行选项来MSVC,这将导致它使用SSE标量OPS代替的x87 FPU指令普通浮点数学。

The SSE ops will subtly change the behavior of this algorithm because they don't have the intermediate 80-bit representation -- the math truly is done in 32 or 64 bits. The good news is that you can easily test it and see if it changes your results by simply specifying the /ARCH:SSE2 command line option to MSVC, which will cause it to use the SSE scalar ops instead of x87 FPU instructions for ordinary floating point math.

我不是确切的四舍五入行为周围的整数边界是什么肯定的副手,但你可以测试一下,看看在1.999 ..会从64位到32位通过的例如

I'm not sure offhand of what the exact rounding behavior is around the integer boundaries, but you can test to see what'll happen when 1.999.. gets rounded from 64 to 32 bits by eg

static uint64 OnePointNineRepeating = 0x3FF FFFFF FFFF FFFF // exponent 0 (biased to 1023), all 1 bits in mantissa
double asDouble = *(double *)(&OnePointNineRepeating);
float asFloat = asDouble;
return asFloat;

修改,结果是:原始的海报运行此测试,发现与截断,在1.99999会总结到2带或不带/弓:SSE2

Edit, result: original poster ran this test and found that with truncation, the 1.99999 will round up to 2 both with and without /arch:SSE2.

这篇关于当截断浮点舍入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆