单精度和双精度浮点运算有什么区别? [英] What's the difference between a single precision and double precision floating point operation?
问题描述
单精度浮点运算和双精度浮点运算有什么不同?
我特别感兴趣的是与视频游戏控制台有关的实际术语。例如,任天堂64有一个64位的处理器,如果这样做,那么这是否意味着它能够进行双精度浮点运算? PS3和Xbox 360可以撤销双精度浮点运算或者单精度浮点运算,并且通常使用双精度浮点运算(如果它们存在的话)。
注意:任天堂64 确实有64-但是:
许多游戏都利用了该芯片的32位处理模式,因为64位数据可提供更高的数据精度类型通常不是3D游戏所需要的,以及处理64位数据使用的RAM,缓存和带宽两倍的事实,从而降低了整个系统的性能。
从 Webopedia :
术语双精度是错误的,因为精度不是真正的双精度。
单字double来源于双精度数字使用两倍于常规浮点数字的事实。
例如,如果单精度数字需要32位,则其double精度对应的将是64位长。
额外的位不仅增加了精度,而且还增加了可表示的幅度范围。
精度和幅度范围的增加的确切量取决于程序用于表示浮点值的格式。
大多数计算机使用称为IEEE浮点格式的标准格式。
来自 IEEE浮点运算标准
单精度 IEEE单精度浮点标准表示法需要一个32位的字,可以从0到31的数字表示从左到右。 最后的23位是分数 F': 由这个单词表示的值V可以如下确定:$ b $如果E = 255且F不为零,那么V = NaN(不是数字)如果E = 255,则B = 特别是, 双精度 IEEE双精度浮点标准表示形式需要一个64位字,它可以表示为从0到63的编号,从左到右。 最后的52位是分数 F': 由这个单词表示的值V可以如下确定:$ b $如果E = 2047且F不为零,则V = NaN(不是数字)如果E = 2047并且F是零且S是1,则V = -Infinity 参考: What is the difference between a single precision floating point operation and double precision floating operation? I'm especially interested in practical terms in relation to video game consoles. For example, does the Nintendo 64 have a 64 bit processor and if it does then would that mean it was capable of double precision floating point operations? Can the PS3 and Xbox 360 pull off double precision floating point operations or only single precision and in general use is the double precision capabilities made use of (if they exist?). Note: the Nintendo 64 does have a 64-bit processor, however: Many games took advantage of the chip's 32-bit processing mode as the greater data precision available with 64-bit data types is not typically required by 3D games, as well as the fact that processing 64-bit data uses twice as much RAM, cache, and bandwidth, thereby reducing the overall system performance. From Webopedia: The term double precision is something of a misnomer because the precision is not really double. The extra bits increase not only the precision but also the range of magnitudes that can be represented. From the IEEE standard for floating point arithmetic Single Precision The IEEE single precision floating point standard representation requires a 32 bit word, which may be represented as numbered from 0 to 31, left to right. the final 23 bits are the fraction 'F': The value V represented by the word may be determined as follows: In particular, Double Precision The IEEE double precision floating point standard representation requires a 64 bit word, which may be represented as numbered from 0 to 63, left to right. the final 52 bits are the fraction 'F': The value V represented by the word may be determined as follows: Reference: 这篇关于单精度和双精度浮点运算有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
$ b
S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF
0 1 8 9 31
如果E = 255且F是零且S是0,则V =无穷大
如果 0
V =( - 1)** S * 2 **(E-127)*(1 .F)
其中1.F是
,意在表示通过用
隐式前导1和二进制点前缀F而创建的二进制数。如果E = 0且F不为零,那么 V =( - 1)** S * 2 **(-126)*(0.F)
。这些
是非标准化值。如果E = 0且F是零且S是1,则V = -0
0 00000000 000000000000000000000000000 = 0
1 00000000 00000000000000000000000 = -0
0 11111111 00000000000000000000000 =无穷
1 11111111 00000000000000000000000 = -Infinity
0 11111111 00000100000000000000000 = NaN
1 11111111 00100010001001010101010 =
$ b $ 10000000 00000000000000000000000 = +1 * 2 **(128-127 )* 1.0 = 2
0 10000001 10100000000000000000000 = +1 * 2 **(129-127)* 1.101 = 6.5
1 10000001 10100000000000000000000 = -1 * 2 **(129-127)* 1.101 = -6.5
00000001 00000000000000000000000 = +1 * 2 **(1-127)* 1.0 = 2 **( - 126)
0 00000000 100000000000000000000000000 = +1 * 2 **( -126)* 0.1 = 2 **( - 127)
0 00000000 00000000000000000000001 = +1 * 2 **( - 126)*
0.00000000000000000000001 =
2 **( - 149)(最小正值)
$ b
S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
0 1 11 12 63
如果E = 2047并且F是零并且S是0,则V = Infinity 0 V =( - 1)** S * 2 **(E-1023)*(1 .F)
其中1.F是
,意在表示通过用
隐式前导1和二进制点前缀F而创建的二进制数。 b $ b V =( - 1)** S * 2 **(-1022)*(0.F)
这些
是如果E = 0并且F是零并且S是1,那么V = -0
如果E = 0且F是零,S是0,那么V = 0
ANSI / IEEE标准754-1985 ,
二进制浮点运算的标准。
The word double derives from the fact that a double-precision number uses twice as many bits as a regular floating-point number.
For example, if a single-precision number requires 32 bits, its double-precision counterpart will be 64 bits long.
The exact amount by which the precision and range of magnitudes are increased depends on what format the program is using to represent floating-point values.
Most computers use a standard format known as the IEEE floating-point format.
S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF
0 1 8 9 31
0<E<255
then V=(-1)**S * 2 ** (E-127) * (1.F)
where "1.F" is
intended to represent the binary number created by prefixing F with an
implicit leading 1 and a binary point.V=(-1)**S * 2 ** (-126) * (0.F)
. These
are "unnormalized" values.0 00000000 00000000000000000000000 = 0
1 00000000 00000000000000000000000 = -0
0 11111111 00000000000000000000000 = Infinity
1 11111111 00000000000000000000000 = -Infinity
0 11111111 00000100000000000000000 = NaN
1 11111111 00100010001001010101010 = NaN
0 10000000 00000000000000000000000 = +1 * 2**(128-127) * 1.0 = 2
0 10000001 10100000000000000000000 = +1 * 2**(129-127) * 1.101 = 6.5
1 10000001 10100000000000000000000 = -1 * 2**(129-127) * 1.101 = -6.5
0 00000001 00000000000000000000000 = +1 * 2**(1-127) * 1.0 = 2**(-126)
0 00000000 10000000000000000000000 = +1 * 2**(-126) * 0.1 = 2**(-127)
0 00000000 00000000000000000000001 = +1 * 2**(-126) *
0.00000000000000000000001 =
2**(-149) (Smallest positive value)
S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
0 1 11 12 63
0<E<2047
then V=(-1)**S * 2 ** (E-1023) * (1.F)
where "1.F" is
intended to represent the binary number created by prefixing F with an
implicit leading 1 and a binary point.V=(-1)**S * 2 ** (-1022) * (0.F)
These
are "unnormalized" values.
ANSI/IEEE Standard 754-1985,
Standard for Binary Floating Point Arithmetic.