Python / numpy浮点文本精度 [英] Python/numpy floating-point text precision
问题描述
>>> import numpy as np
>>> v32 = np.array([5,0.1,2.4,4.555555555555555,12345678.92345678635],
dtype = np.float32)
>>> v64 = np.array([5,0.1,2.4,4.555555555555555,12345678.92345678635],
dtype = np.float64)
我想将这些值序列化为文本,而不会丢失精度(或者至少真的接近不会失去精度)。我想这样做的标准方式是 repr
:
> ;>> map(repr,v32)
['5.0','0.1','2.4000001','4.5555553','12345679.0']
>>> map(repr,v64)
['5.0','0.10000000000000001','2.3999999999999999','4.5555555555555554',
'12345678.923456786']
但是我想使表示尽可能紧凑以最小化文件大小,所以如果像2.4这样的值被序列化了,而没有额外的小数,那就好了。是的,我知道这是他们实际的浮点表示,但是
%g
似乎可以处理这个问题: >>> ('%.7g'* len(v32))%tuple(v32)
'5 0.1 2.4 4.555555 1.234568e + 07'
>>> ('%.16g'* len(v32))%tuple(v64)
'5 0.1 2.4 4.555555555555555 12345678.92345679'
我的问题是:以这种方式使用%g
是否安全?是 .7
和 .16
正确的值,这样精度不会丢失?
repr
实现,它将0.1打印为 0.1
。摘要输出优先于其他候选项,例如 0.10000000000000001
,因为它是该特定数字的最短表示,它往返于完全相同的浮动当读回到Python中的点值。要使用这个算法,在把它们交给 repr
: <$ p之前,将你的64位浮点数转换为实际的Python浮点数$ p>
>>> map(repr,map(float,v64))
['5.0','0.1','2.4','4.555555555555555','12345678.923456786']
令人惊讶的是,结果是自然的和在数字上是正确的。有关2.7 / 3.2 repr
的更多信息可以在最新消息和精彩的演讲 。
不幸的是,这个技巧不适用于32位的浮点数,至少不是没有重新实现Python 2.7的 repr使用的算法
。
Let's say I have some 32-bit and 64-bit floating point values:
>>> import numpy as np
>>> v32 = np.array([5, 0.1, 2.4, 4.555555555555555, 12345678.92345678635],
dtype=np.float32)
>>> v64 = np.array([5, 0.1, 2.4, 4.555555555555555, 12345678.92345678635],
dtype=np.float64)
I want to serialize these values to text without losing precision (or at least really close to not losing precision). I think the canonical way of doing this is with repr
:
>>> map(repr, v32)
['5.0', '0.1', '2.4000001', '4.5555553', '12345679.0']
>>> map(repr, v64)
['5.0', '0.10000000000000001', '2.3999999999999999', '4.5555555555555554',
'12345678.923456786']
But I want to make the representation as compact as possible to minimize file size, so it would be nice if values like 2.4 got serialized without the extra decimals. Yes, I know that's their actual floating point representation, but %g
seems to be able to take care of this:
>>> ('%.7g ' * len(v32)) % tuple(v32)
'5 0.1 2.4 4.555555 1.234568e+07 '
>>> ('%.16g ' * len(v32)) % tuple(v64)
'5 0.1 2.4 4.555555555555555 12345678.92345679 '
My question is: is it safe to use %g
in this way? Are .7
and .16
the correct values so that precision won't be lost?
Python 2.7 and later already have a smart repr
implementation for floats that prints 0.1 as 0.1
. The brief output is chosen in preference to other candidates such as 0.10000000000000001
because it is the shortest representation of that particular number that roundtrips to the exact same floating-point value when read back into Python. To use this algorithm, convert your 64-bit floats to actual Python floats before handing them off to repr
:
>>> map(repr, map(float, v64))
['5.0', '0.1', '2.4', '4.555555555555555', '12345678.923456786']
Surprisingly, the result is natural-looking and numerically correct. More info on the 2.7/3.2 repr
can be found in What's New and a fascinating lecture by Mark Dickinson.
Unfortunately, this trick won't work for 32-bit floats, at least not without reimplementing the algorithm used by Python 2.7's repr
.
这篇关于Python / numpy浮点文本精度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!