第8位的双向浮动舍入误差 [英] double to float rounding error in 8th digit

查看:51
本文介绍了第8位的双向浮动舍入误差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一台32位英特尔和64位AMD机器。第8位有一个四舍五入的

错误。不幸的是,由于我们使用的算法,

错误渗透到更高的数字。


C ++代码是

---- --------------

b [2] [12] + =(浮动)(模式* val);

开32位(intel,vs 2003,C ++),一些监视变量是

----------------------------- -------------------------------------------------- ---

b [2] [12] 7.5312500浮动

模式* val -4.7720763683319092双

(浮动)(模式* val) - 4.7720766浮动

(浮动)(b [2] [12] +模式* val)2.7591736浮动

b [2] [12] +(浮动)(模式* val )2.7591733932495117双

(浮动)(b [2] [12] +(浮动)(模式* val))2.7591734浮动

增量后变为

b [2] [12] 2.7591736 float< --------------这是不同的值

64位(amd,vs 2005, C ++),一些监视变量是

----------------------------------- -----------------------------------------------

b [2] [12] 7.5312500浮动

模式* val -4.7720763683319092双

(浮动)(模式* val)-4.7720766浮动

(浮动)(b [2] [12] +模式* val)2.7591736浮动

b [2] [12] +(浮动)(模式* val)2.7591733932495117双

(浮动)(b [2] [12] + (浮动)(模式* val))2.7591734浮动

增量后变为

b [2] [12] 2.7591734浮动< --------- -----这是不同的价值

I have a 32 bit intel and 64 bit AMD machine. There is a rounding
error in the 8th digit. Unfortunately because of the algorithm we use,
the errors percolate into higher digits.

C++ code is
------------------
b[2][12] += (float)(mode *val);
On 32 bit(intel , vs 2003, C++), some watch variables are
----------------------------------------------------------------------------------
b[2][12] 7.5312500 float
mode*val -4.7720763683319092 double
(float)(mode*val) -4.7720766 float
(float)(b[2][12]+mode*val) 2.7591736 float
b[2][12]+(float)(mode*val) 2.7591733932495117 double
(float)(b[2][12]+(float)(mode*val)) 2.7591734 float
After increment it becomes
b[2][12] 2.7591736 float <-------------- This is the different value
On 64 bit(amd , vs 2005, C++), some watch variables are
----------------------------------------------------------------------------------
b[2][12] 7.5312500 float
mode*val -4.7720763683319092 double
(float)(mode*val) -4.7720766 float
(float)(b[2][12]+mode*val) 2.7591736 float
b[2][12]+(float)(mode*val) 2.7591733932495117 double
(float)(b[2][12]+(float)(mode*val)) 2.7591734 float
After increment it becomes
b[2][12] 2.7591734 float <-------------- This is the different value

推荐答案

" Shirsoft" < sh ****** @ gmail.com写信息

新闻:11 ********************* @ k78g2000cwa。 googlegro ups.com ...
"Shirsoft" <sh******@gmail.comwrote in message
news:11*********************@k78g2000cwa.googlegro ups.com...

>我有32位intel和64位AMD机器。第8位有一个四舍五入的

错误。不幸的是,由于我们使用的算法,

错误渗透到更高的数字。


C ++代码是

---- --------------

b [2] [12] + =(浮动)(模式* val);


在32位(intel,vs 2003,C ++)上,一些监视变量是

------------------------- -------------------------------------------------- -------

b [2] [12] 7.5312500浮动

模式* val -4.7720763683319092双

(浮动)(模式* val)-4.7720766浮动

(浮动)(b [2] [12] +模式* val)2.7591736浮动

b [2] [12] +(浮动) (模式* val)2.7591733932495117双

(浮动)(b [2] [12] +(浮动)(模式* val))2.7591734浮动

增量后变为

b [2] [12] 2.7591736浮动< --------------这是不同的价值


开64位(amd,vs 2005,C ++),一些监视变量是

--------------------------- -------------------------------------------------- -----

b [2] [12] 7.5312500 f厌恶

模式* val -4.7720763683319092双

(浮动)(模式* val)-4.7720766浮动

(浮动)(b [2] [12] +模式* val)2.7591736浮动

b [2] [12] +(浮动)(模式* val)2.7591733932495117双

(浮动)(b [2 ] [12] +(浮动)(模式* val))2.7591734浮动

增量后变为

b [2] [12] 2.7591734浮动< ---- ----------这是不同的价值
>I have a 32 bit intel and 64 bit AMD machine. There is a rounding
error in the 8th digit. Unfortunately because of the algorithm we use,
the errors percolate into higher digits.

C++ code is
------------------
b[2][12] += (float)(mode *val);
On 32 bit(intel , vs 2003, C++), some watch variables are
----------------------------------------------------------------------------------
b[2][12] 7.5312500 float
mode*val -4.7720763683319092 double
(float)(mode*val) -4.7720766 float
(float)(b[2][12]+mode*val) 2.7591736 float
b[2][12]+(float)(mode*val) 2.7591733932495117 double
(float)(b[2][12]+(float)(mode*val)) 2.7591734 float
After increment it becomes
b[2][12] 2.7591736 float <-------------- This is the different value
On 64 bit(amd , vs 2005, C++), some watch variables are
----------------------------------------------------------------------------------
b[2][12] 7.5312500 float
mode*val -4.7720763683319092 double
(float)(mode*val) -4.7720766 float
(float)(b[2][12]+mode*val) 2.7591736 float
b[2][12]+(float)(mode*val) 2.7591733932495117 double
(float)(b[2][12]+(float)(mode*val)) 2.7591734 float
After increment it becomes
b[2][12] 2.7591734 float <-------------- This is the different value



这是一个声明,而不是一个问题。


是的,浮点数不准确到所有小数位。如果你需要100%的准确度,你需要使用别的东西。

This is a statement, not a question.

Yes, floating point numbers are not accurate to all decimal places. If you
need 100% accuracy you''ll need to use something else.


我很抱歉这个混乱,但我的问题是为什么32位

机器将其转换为x.xxxxx36而不是34位.64位机器

做得对。有没有办法解决它。


2月8日下午3:42,Jim Langston < tazmas ... @ rocketmail.comwrote:
I am sorry for the confusion, but my question is that why 32 bit
machines rounds it off to x.xxxxx36 instead of 34. The 64 bit machines
does it right. Is there some way to fix it.

On Feb 8, 3:42 pm, "Jim Langston" <tazmas...@rocketmail.comwrote:

" Shirsoft" < shirs ... @ gmail.com写信息


新闻:11 ********************* @ k78g2000cwa .googlegro ups.com ...
"Shirsoft" <shirs...@gmail.comwrote in message

news:11*********************@k78g2000cwa.googlegro ups.com...

我有32位英特尔和64位AMD机器。第8位有一个四舍五入的

错误。不幸的是,由于我们使用的算法,

错误渗透到更高的数字。
I have a 32 bit intel and 64 bit AMD machine. There is a rounding
error in the 8th digit. Unfortunately because of the algorithm we use,
the errors percolate into higher digits.


C ++代码是

------------------

b [2] [12] + =(浮动)(模式* val);
C++ code is
------------------
b[2][12] += (float)(mode *val);


在32位(intel,vs 2003,C ++)上,一些监视变量是

------ -------------------------------------------------- --------------------------

b [2] [12] 7.5312500浮动

模式* val -4.7720763683319092双

(浮动)(模式* val)-4.7720766浮动

(浮动)(b [2] [12] +模式* val)2.7591736浮动

b [2] [12] +(浮动)(模式* val)2.7591733932495117双

(浮动)(b [2] [12] +(浮动)(模式* val))2.7591734浮动

增量后变为

b [2] [12] 2.7591736浮动< ------------- - 这是不同的值
On 32 bit(intel , vs 2003, C++), some watch variables are
----------------------------------------------------------------------------------
b[2][12] 7.5312500 float
mode*val -4.7720763683319092 double
(float)(mode*val) -4.7720766 float
(float)(b[2][12]+mode*val) 2.7591736 float
b[2][12]+(float)(mode*val) 2.7591733932495117 double
(float)(b[2][12]+(float)(mode*val)) 2.7591734 float
After increment it becomes
b[2][12] 2.7591736 float <-------------- This is the different value


在64位(amd,vs 2005,C ++)上,一些监视变量是

-------------------------------------------------- --------------------------------

b [2] [12] 7.5312500 float

模式* val -4.7720763683319092双

(浮动)(模式* val)-4.7720766 floa t

(浮动)(b [2] [12] +模式* val)2.7591736浮动

b [2] [12] +(浮动)(模式* val) 2.7591733932495117双

(浮动)(b [2] [12] +(浮动)(模式* val))2.7591734浮动

增值后变为

b [2] [12] 2.7591734 float< --------------这是不同的值
On 64 bit(amd , vs 2005, C++), some watch variables are
----------------------------------------------------------------------------------
b[2][12] 7.5312500 float
mode*val -4.7720763683319092 double
(float)(mode*val) -4.7720766 float
(float)(b[2][12]+mode*val) 2.7591736 float
b[2][12]+(float)(mode*val) 2.7591733932495117 double
(float)(b[2][12]+(float)(mode*val)) 2.7591734 float
After increment it becomes
b[2][12] 2.7591734 float <-------------- This is the different value



这是一个声明,而不是一个问题。


是的,浮点数对于所有小数位都不准确。如果你需要100%的准确度,你需要使用别的东西。


This is a statement, not a question.

Yes, floating point numbers are not accurate to all decimal places. If you
need 100% accuracy you''ll need to use something else.



2月8日凌晨3:56,Shirsoft < shirs ... @ gmail.comwrote:
On Feb 8, 3:56 am, "Shirsoft" <shirs...@gmail.comwrote:

我很抱歉这个混乱,但我的问题是为什么32位

机器将其四舍五入到x.xxxxx36而不是34. 64位机器

做得对。有没有办法解决它。
I am sorry for the confusion, but my question is that why 32 bit
machines rounds it off to x.xxxxx36 instead of 34. The 64 bit machines
does it right. Is there some way to fix it.



大多数机器使用IEEE754浮点数并代表浮点数



Most machines use IEEE754 floating point numbers and represent a float
as a 32 bit version of that format. As such, you only have seven
significant digits of accuracy so anything beyond that will be subject
to errors.


这篇关于第8位的双向浮动舍入误差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆