是否可以强制某个浮点数的指数或有效位数与另一个浮点数匹配(Python)? [英] Is it possible to force exponent or significand of a float to match another float (Python)?

查看:115
本文介绍了是否可以强制某个浮点数的指数或有效位数与另一个浮点数匹配(Python)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我前一天尝试解决的一个有趣的问题.是否可以在Python中强制一个float的有效位数或指数与另一个float相同?

This is an interesting question that I was trying to work through the other day. Is it possible to force the significand or exponent of one float to be the same as another float in Python?

出现问题是因为我试图重新缩放某些数据,以使min和max与另一个数据集匹配.但是,我重新缩放后的数据略有下降(大约在小数点后6位之后),这足以引起问题.

The question arises because I was trying to rescale some data so that the min and max match another data set. However, my rescaled data was slightly off (after about 6 decimal places) and it was enough to cause problems down the line.

要提出一个想法,我有f1f2(type(f1) == type(f2) == numpy.ndarray).我要np.max(f1) == np.max(f2) and np.min(f1) == np.min(f2).为此,我这样做:

To give an idea, I have f1 and f2 (type(f1) == type(f2) == numpy.ndarray). I want np.max(f1) == np.max(f2) and np.min(f1) == np.min(f2). To achieve this, I do:

import numpy as np

f2 = (f2-np.min(f2))/(np.max(f2)-np.min(f2)) # f2 is now between 0.0 and 1.0
f2 = f2*(np.max(f1)-np.min(f1)) + np.min(f1)  # f2 is now between min(f1) and max(f1)

结果(仅作为示例)将是:

The result (just as an example) would be:

np.max(f1) # 5.0230593
np.max(f2) # 5.0230602 but I need 5.0230593 

我最初的想法是,强制float的指数将是正确的解决方案.我找不到很多东西,所以我根据需要做了一个解决方法:

My initial thought is that forcing the exponent of the float would be the correct solution. I couldn't find much on it, so I made a workaround for my need:

exp = 0
mm = np.max(f1)

# find where the decimal is
while int(10**exp*mm) == 0
  exp += 1

# add 4 digits of precision
exp += 4

scale = 10**exp

f2 = np.round(f2*scale)/scale
f1 = np.round(f1*scale)/scale

现在np.max(f2) == np.max(f1)

但是,还有更好的方法吗?我做错什么了吗?是否可以将float重塑为与另一个float类似(指数或其他方式)?

However, is there a better way? Did I do something wrong? Is it possible to reshape a float to be similar to another float (exponent or other means)?

按照建议,我现在正在使用:

as was suggested, I am now using:

scale = 10**(-np.floor(np.log10(np.max(f1))) + 4)

虽然上面的解决方案可以(对于我的应用程序)有效,但是我很想知道是否有一种解决方案可以以某种方式强制float具有相同的指数和/或有效位数,从而使数字变为相同.

While my solution above will work (for my application), I'm interested to know if there's a solution that can somehow force the float to have the same exponent and/or significand so that the numbers will become identical.

推荐答案

TL; DR

使用

f2 = f2*np.max(f1)-np.min(f1)*(f2-1)  # f2 is now between min(f1) and max(f1)

并确保使用双精度,通过查看绝对或相对差来比较浮点数,避免舍入以调整(或比较)浮点数,并且不要手动设置浮点数的基础组件

and make sure you're using double precision, compare floating point numbers by looking at absolute or relative differences, avoid rounding for adjusting (or comparing) floating point numbers, and don't set the underlying components of floating point numbers manually.

详细信息

正如您所发现的,这不是一个非常容易重现的错误.但是,使用浮点数会出错.例如,将1 000 000 000 + 0 . 000 000 000 1加在一起将得到1 000 000 000 . 000 000 000 1,但是即使对于双精度(这支持 15个有效数字),因此末尾的小数将被删除.此外,正如@Kevin的答案所述,某些短"数字无法准确表示.参见例如此处,了解更多. (搜索更多类似浮点截断取整错误"的内容.)

This isn't a very easy error to reproduce, as you have discovered. However, working with floating numbers is subject to error. E.g., adding together 1 000 000 000 + 0 . 000 000 000 1 gives 1 000 000 000 . 000 000 000 1, but this is too many significant figures even for double precision (which supports around 15 significant figures), so the trailing decimal is dropped. Moreover, some "short" numbers can't be represented exactly, as noted in @Kevin's answer. See, e.g., here, for more. (Search for something like "floating point truncation roundoff error" for even more.)

这是一个演示问题的示例:

Here's an example which does demonstrate a problem:

import numpy as np

numpy.set_printoptions(precision=16)

dtype=np.float32                     
f1 = np.linspace(-1000, 0.001, 3, dtype=dtype)
f2 = np.linspace(0, 1, 3, dtype=dtype)

f2 = (f2-np.min(f2))/(np.max(f2)-np.min(f2)) # f2 is now between 0.0 and 1.0
f2 = f2*(np.max(f1)-np.min(f1)) + np.min(f1)  # f2 is now between min(f1) and max(f1)

print (f1)
print (f2)

输出

[ -1.0000000000000000e+03  -4.9999951171875000e+02   1.0000000474974513e-03]
[ -1.0000000000000000e+03  -4.9999951171875000e+02   9.7656250000000000e-04]

@Mark Dickinson的

Following @Mark Dickinson's comment, I have used 32 bit floating point. This is consistent with the error you reported, a relative error of around 10^-7, around the 7th significant figure

In: (5.0230602 - 5.0230593) / 5.0230593
Out: 1.791736760621852e-07

转到dtype=np.float64会使情况变得更好,但这仍然不是完美的.上面的程序然后给出

Going to dtype=np.float64 makes things better but it still isn't perfect. The program above then gives

[ -1.0000000000000000e+03  -4.9999950000000001e+02   1.0000000000000000e-03]
[ -1.0000000000000000e+03  -4.9999950000000001e+02   9.9999999997635314e-04]

这不是完美的,但通常足够接近.比较浮点数时,由于上面提到的小错误的可能性,您几乎从不希望使用严格的相等性.取而代之的是从另一个中减去一个数,然后检查绝对差是否小于某个公差,和/或查看相对误差.参见例如 numpy.isclose .

This isn't perfect, but is generally close enough. When comparing floating point numbers you almost never want to use strict equality because of the possibility of small errors as noted above. Instead subtract one number from the other and check the absolute difference is less than some tolerance, and/or look at the relative error. See, e.g., numpy.isclose.

回到您的问题,看来应该可以做得更好.毕竟,f2的范围是0到1,因此您应该能够复制f1中的最大值.问题出在行

Returning to your problem, it seems like it should be possible to do better. After all, f2 has the range 0 to 1, so you should be able to replicate the maximum in f1. The problem comes in the line

f2 = f2*(np.max(f1)-np.min(f1)) + np.min(f1)  # f2 is now between min(f1) and max(f1)

因为当f2的元素为1时,您要做的比将1乘以f1的最大值大得多,因此可能会出现浮点算术错误.请注意,您可以将方括号f2*(np.max(f1)-np.min(f1))f2*np.max(f1) - f2*np.min(f1)相乘,然后将得到的- f2*np.min(f1) + np.min(f1)np.min(f1)*(f2-1)给予

because when an element of f2 is 1 you're doing a lot more to it than just multiplying 1 by the max of f1, leading to the possibility of floating point arithmetic errors occurring. Notice that you can multiply out the brackets f2*(np.max(f1)-np.min(f1)) to f2*np.max(f1) - f2*np.min(f1), and then factorize the resulting - f2*np.min(f1) + np.min(f1) to np.min(f1)*(f2-1) giving

f2 = f2*np.max(f1)-np.min(f1)*(f2-1)  # f2 is now between min(f1) and max(f1)

因此,当f2的元素为1时,我们具有1*np.max(f1) - np.min(f1)*0.相反,当f2的元素为0时,则为0*np.max(f1) - np.min(f1)*1.数字1和0 可以准确表示,因此应该没有错误.

So when an element of f2 is 1, we have 1*np.max(f1) - np.min(f1)*0. Conversely when an element of f2 is 0, we have 0*np.max(f1) - np.min(f1)*1. The numbers 1 and 0 can be exactly represented so there should be no errors.

修改后的程序输出

[ -1.0000000000000000e+03  -4.9999950000000001e+02   1.0000000000000000e-03]
[ -1.0000000000000000e+03  -4.9999950000000001e+02   1.0000000000000000e-03]

即根据需要.

尽管如此,我仍然强烈建议您仅使用不精确的浮点比较(如果需要,可以使用严格的界限),除非您有很好的理由不这样做.浮点运算中可能会发生各种细微的错误,而避免它们的最简单方法就是永远不要使用精确比较.

Nevertheless I would still strongly recommend only using inexact floating point comparison (with tight bounds if you need) unless you have a very good reason not to do so. There are all sorts of subtle errors that can occur in floating point arithmetic and the easiest way to avoid them is never to use exact comparison.

一种替代上述方法的方法可能是更好的选择,方法是将两个数组均重新缩放为0到1.这可能是程序中最合适的形式. (如果需要,两个数组都可以乘以一个比例因子,例如f1的原始范围.)

An alternative approach to that given above, that might be preferable, would be to rescale both arrays to between 0 and 1. This might be the most suitable form to use within the program. (And both arrays could be multiplied by a scaling factor such the original range of f1, if necessary.)

重新使用四舍五入来解决您的问题,我建议这样做.四舍五入的问题-除了不必要的事实会降低数据的准确性-的问题是,非常接近的数字可以朝不同的方向四舍五入.例如

Re using rounding to solve your problem, I would not recommend this. The problem with rounding -- apart from the fact that it unnecessary reduces the accuracy of your data -- is that numbers that are very close can round in different directions. E.g.

f1 = np.array([1.000049])
f2 = np.array([1.000051])
print (f1)
print (f2)
scale = 10**(-np.floor(np.log10(np.max(f1))) + 4)
f2 = np.round(f2*scale)/scale
f1 = np.round(f1*scale)/scale
print (f1)
print (f2)

输出

[ 1.000049]
[ 1.000051]
[ 1.]
[ 1.0001]

这与以下事实有关:尽管讨论与这么多有效数字匹配的数字是很常见的,但人们实际上并没有在计算机中以这种方式进行比较.您可以计算出差异,然后除以正确的数字(以得到相对误差).

This is related to the fact that although it's common to discuss numbers matching to so many significant figures, people don't actually compare them this way in the computer. You calculate the difference and then divide by the correct number (for a relative error).

关于尾数和指数,请参见math.frexpmath.ldexp,记录在此处.但是,我不建议自己设置这些值(例如,考虑两个非常接近但具有不同指数的数字-您是否真的要设置尾数).如果要确保数字完全相同(最小值也应类似),最好直接将f2的最大值直接显式设置为f1的最大值.

Re mantissas and exponents, see math.frexp and math.ldexp, documented here. I would not recommend setting these yourself however (consider two numbers that are very close but have different exponents, for example -- do you really want to set the mantissa). Much better to just directly set the maximum of f2 explicitly to the maximum of f1, if you want to ensure the numbers are exactly the same (and similarly for the minimum).

这篇关于是否可以强制某个浮点数的指数或有效位数与另一个浮点数匹配(Python)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆