Python:在存在nan的情况下排序功能中断 [英] Python: sort function breaks in the presence of nan

查看:489
本文介绍了Python:在存在nan的情况下排序功能中断的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

sorted([2, float('nan'), 1])返回[2, nan, 1]

(至少在Activestate Python 3.1实现上.)

(At least on Activestate Python 3.1 implementation.)

我知道nan是一个奇怪的对象,所以如果它出现在排序结果中的随机位置,我不会感到惊讶.但这也弄乱了容器中非南数的排序,这确实是出乎意料的.

I understand nan is a weird object, so I wouldn't be surprised if it shows up in random places in the sort result. But it also messes up the sort for the non-nan numbers in the container, which is really unexpected.

我问了相关问题关于max,并据此了解sort为何如此工作.但这应该算是错误吗?

I asked a related question about max, and based on that I understand why sort works like this. But should this be considered a bug?

文档仅显示返回新的排序列表[...]",而未指定任何详细信息.

Documentation just says "Return a new sorted list [...]" without specifying any details.

我现在同意这不违反IEEE标准.但是,我认为,从任何常识角度来看,这都是一个错误.甚至不知道经常承认错误的Microsoft都已将此错误识别为错误,并在最新版本中对其进行了修复:

I now agree that this isn't in violation of the IEEE standard. However, it's a bug from any common sense viewpoint, I think. Even Microsoft, which isn't known to admit their mistakes often, has recognized this one as a bug, and fixed it in the latest version: http://connect.microsoft.com/VisualStudio/feedback/details/363379/bug-in-list-double-sort-in-list-which-contains-double-nan.

无论如何,我最终还是遵循@khachik的回答:

Anyway, I ended up following @khachik's answer:

sorted(list_, key = lambda x : float('-inf') if math.isnan(x) else x)

我怀疑与默认情况下使用该语言的语言相比,它会导致性能下降,但至少可以奏效(除非我引入了任何错误).

I suspect it results in a performance hit compared to the language doing that by default, but at least it works (barring any bugs that I introduced).

推荐答案

先前的答案很有用,但对于问题的根源可能还不清楚.

The previous answers are useful, but perhaps not clear regarding the root of the problem.

在任何语言中,sort都会在输入值的范围内应用由比较功能或其他方式定义的给定顺序.例如,当且仅当小于定义输入值的适当排序时,才可以使用小于号operator <,.

In any language, sort applies a given ordering, defined by a comparison function or in some other way, over the domain of the input values. For example, less-than, a.k.a. operator <, could be used throughout if and only if less than defines a suitable ordering over the input values.

但是,这对于浮点值和小于特别不正确: "NaN是无序的:它不等于,大于或小于任何东西,包括其自身." (来自GNU C手册的清晰散文,,但适用于所有基于IEEE754的现代浮点数)

But this is specifically NOT true for floating point values and less-than: "NaN is unordered: it is not equal to, greater than, or less than anything, including itself." (Clear prose from GNU C manual, but applies to all modern IEEE754 based floating point)

因此可能的解决方案是:

So the possible solutions are:

    首先删除NaN,使输入域通过< (或正在使用的其他排序功能)
  1. 定义一个自定义比较函数(也称为谓词) 定义NaN的顺序,例如小于任何数字或大于 比任何数字都多.
  1. remove the NaNs first, making the input domain well defined via < (or the other sorting function being used)
  2. define a custom comparison function (a.k.a. predicate) that does define an ordering for NaN, such as less than any number, or greater than any number.

任何一种方法都可以使用任何语言.

Either approach can be used, in any language.

实际上,考虑到python,如果您不太关心最快的性能或者在上下文中需要删除NaN,则我希望删除NaN.

Practically, considering python, I would prefer to remove the NaNs if you either don't care much about fastest performance or if removing NaNs is a desired behavior in context.

否则,您可以在较旧的python版本中通过"cmp"或通过this和functools.cmp_to_key()使用合适的谓词函数.自然,后者要比先去除NaN更尴尬.并且在定义此谓词功能时,需要格外小心,以免出现更差的性能.

Otherwise you could use a suitable predicate function via "cmp" in older python versions, or via this and functools.cmp_to_key(). The latter is a bit more awkward, naturally, than removing the NaNs first. And care will be required to avoid worse performance, when defining this predicate function.

这篇关于Python:在存在nan的情况下排序功能中断的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆