c ++浮点精度损失:3015 / 0.00025298219406977296 [英] c++ floating point precision loss: 3015/0.00025298219406977296

查看:149
本文介绍了c ++浮点精度损失:3015 / 0.00025298219406977296的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



Microsoft Visual C ++ 2005编译器,32位窗口xp sp3,amd 64 x2 cpu。



代码:

  double a = 3015.0; 
double b = 0.00025298219406977296;
// *((unsigned __int64 *)(& a))== 0x40a78e0000000000
// *((unsigned __int64 *)(& b))== 0x3f30945640000000
double f = a / b; // 3015 / 0.00025298219406977296;

计算结果(即f)为11917835.000000000((unsigned __int64 )(& f))== 0x4166bb4160000000)虽然它应该是11917834.814763514(即((unsigned __int64 )(& f))== 0x4166bb415a128aef)。
$ b $就是这样分数部分丢失。

不幸的是,我需要小数部分才能正确。



问题:

1)为什么这是否发生?

2)如何解决问题?



附加信息:

0)结果被采取直接从手表窗口(没有打印,我没有忘记设置打印精度)。我还提供了十六进制转储浮点变量,所以我绝对确定计算结果。

1)f = a / b的反汇编是:

  fld qword ptr [a] 
fdiv qword ptr [b]
fstp qword ptr [f]

2)f = 3015 / 0.00025298219406977296;产生正确的结果(f == 11917834.814763514,((unsigned __int64 )(& f))== 0x4166bb415a128aef),但是在这种情况下,结果只是在编译期间简单计算: p>

  fld qword ptr [__real @ 4166bb415a128aef(828EA0h)] 
fstp qword ptr [f]

那么,我该如何解决这个问题?



我找到了一个临时的解决方法(我只需要分数部分,所以我现在只使用f = fmod(a / b)/ b),但我仍然想知道如何正确解决这个问题 - 双精度应该是16位十进制数,所以这样的计算不会引起问题。

解决方案

你是否使用directx您的程序在任何地方会导致浮点单元切换到单精度模式,除非您明确地告诉它不是当您创建设备,并将导致这个


The problem.

Microsoft Visual C++ 2005 compiler, 32bit windows xp sp3, amd 64 x2 cpu.

Code:

double a = 3015.0; 
double b = 0.00025298219406977296;
//*((unsigned __int64*)(&a)) == 0x40a78e0000000000  
//*((unsigned __int64*)(&b)) == 0x3f30945640000000  
double f = a/b;//3015/0.00025298219406977296;

the result of calculation (i.e. "f") is 11917835.000000000 (((unsigned __int64)(&f)) == 0x4166bb4160000000) although it should be 11917834.814763514 (i.e. ((unsigned __int64)(&f)) == 0x4166bb415a128aef).
I.e. fractional part is lost.
Unfortunately, I need fractional part to be correct.

Questions:
1) Why does this happen?
2) How can I fix the problem?

Additional info:
0) The result is taken directly from "watch" window (it wasn't printed, and I didn't forget to set printing precision). I also provided hex dump of floating point variable, so I'm absolutely sure about calculation result.
1) The disassembly of f = a/b is:

fld         qword ptr [a]  
fdiv        qword ptr [b]  
fstp        qword ptr [f]  

2) f = 3015/0.00025298219406977296; yields correct result (f == 11917834.814763514 , ((unsigned __int64)(&f)) == 0x4166bb415a128aef ), but it looks like in this case result is simply calculated during compile-time:

fld         qword ptr [__real@4166bb415a128aef (828EA0h)]  
fstp        qword ptr [f]  

So, how can I fix this problem?

P.S. I've found a temporary workaround (i need only fractional part of division, so I simply use f = fmod(a/b)/b at the moment), but I still would like to know how to fix this problem properly - double precision is supposed to be 16 decimal digits, so such calculation isn't supposed to cause problems.

解决方案

Are you using directx in your program anywhere as that causes the floating point unit to get switched to single precision mode unless you specifically tell it not to when you create the device and would cause exactly this

这篇关于c ++浮点精度损失:3015 / 0.00025298219406977296的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆