为什么这个用于对异类序列进行排序的键类的行为会奇怪? [英] Why does this key class for sorting heterogeneous sequences behave oddly?

查看:138
本文介绍了为什么这个用于对异类序列进行排序的键类的行为会奇怪?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

无法依赖Python 3.x的 sorted() 函数继续对异类序列进行排序,因为大多数不同类型的对都是无序的(intfloatdecimal.Decimal等数字类型是例外):

Python 3.x's sorted() function cannot be relied on to sort heterogeneous sequences, because most pairs of distinct types are unorderable (numeric types like int, float, decimal.Decimal etc. being an exception):

Python 3.4.2 (default, Oct  8 2014, 08:07:42) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> sorted(["one", 2.3, "four", -5])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: float() < str()

相反,没有自然顺序的对象之间的比较是任意的,但在Python 2.x中是一致的,因此sorted()可以工作:

In contrast, comparisons between objects that have no natural order are arbitrary but consistent in Python 2.x, so sorted() works:

Python 2.7.8 (default, Aug  8 2014, 14:55:30) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> sorted(["one", 2.3, "four", -5])
[-5, 2.3, 'four', 'one']

为了在Python 3.x中复制Python 2.x的行为,我编写了一个类用作sorted()key参数,该类依赖于sorted()

In order to replicate Python 2.x's behaviour in Python 3.x, I wrote a class to use as the key parameter to sorted(), which relies on the fact that sorted() is guaranteed to use only less-than comparisons:

class motley:

    def __init__(self, value):
        self.value = value

    def __lt__(self, other):
        try:
            return self.value < other.value
        except TypeError:
            return repr(type(self.value)) < repr(type(other.value))

示例用法:

>>> sorted(["one", 2.3, "four", -5], key=motley)
[-5, 2.3, 'four', 'one']

到目前为止,很好.

但是,当使用某些包含复数的序列调用sorted(s, key=motley)时,我注意到了一个令人惊讶的行为:

However, I've noticed a surprising behaviour when sorted(s, key=motley) is called with certain sequences containing complex numbers:

>>> sorted([0.0, 1, (1+0j), False, (2+3j)], key=motley)
[(1+0j), 0.0, False, (2+3j), 1]

我希望0.0False1在一个组中(因为它们是可相互排序的),而在(1+0j)(2+3j)中在另一个组中(因为它们是同一类型) .这个结果中的复数不仅彼此分开,而且其中一个正坐在一组彼此可比但不具有可比性的对象中间,这一事实有些令人困惑.

I would have expected 0.0, False and 1 to be in one group (because they are mutually orderable), and (1+0j) and (2+3j) in another (because they are of the same type). The fact that the complex numbers in this result are not only separated from each other, but one of them is sitting in the middle of a group of objects that are comparable with each other but not with it, is somewhat perplexing.

这是怎么回事?

推荐答案

您不知道比较按照什么顺序进行,甚至不知道比较哪些项目,这意味着您真的不知道__lt__有什么影响将有.您定义的__lt__有时取决于实际值,有时取决于类型的字符串表示形式,但是在排序过程中,两个版本都可以用于同一对象.这意味着您的排序不仅仅取决于列表中的对象,还可能取决于它们的初始顺序.反过来,这意味着仅仅因为对象是可相互比较的,并不意味着它们将被排序在一起.它们之间可能被无法比拟的对象阻挡".

You do not know what order the comparisons are done in, or even which items are compared, which means you can't really know what effect your __lt__ will have. Your defined __lt__ sometimes depends on the actual values, and sometimes on the string representations of the types, but both versions may be used for the same object in the course of the sort. This means that your ordering is not determined solely by the objects in the list, but also may depend on their initial order. This in turn means that just because objects are mutually comparable does not mean they will be sorted together; they may be "blocked" by an incomparable object between them.

您可以通过放入一些调试打印来查看正在比较的内容,以了解发生了什么:

You can get an inkling of what is going on by putting some debugging prints in to see what it's comparing:

class motley:

    def __init__(self, value):
        self.value = value

    def __lt__(self, other):
        fallback = False
        try:
            result = self.value < other.value
        except TypeError:
            fallback = True
            result = repr(type(self.value)) < repr(type(other.value))
        symbol = "<" if result else ">"
        print(self.value, symbol, other.value, end="")
        if fallback:
            print(" -- because", repr(type(self.value)), symbol, repr(type(other.value)))
        else:
            print()
        return result

然后:

>>> sorted([0.0, 1, (1+0j), False, (2+3j)], key=motley)
1 > 0.0
(1+0j) < 1 -- because <class 'complex'> < <class 'int'>
(1+0j) < 1 -- because <class 'complex'> < <class 'int'>
(1+0j) < 0.0 -- because <class 'complex'> < <class 'float'>
False > 0.0
False < 1
(2+3j) > False -- because <class 'complex'> > <class 'bool'>
(2+3j) < 1 -- because <class 'complex'> < <class 'int'>
[(1+0j), 0.0, False, (2+3j), 1]

例如,您可以看到基于类型的排序用于将复数与1进行比较,但不用于对1和0进行比较.同样,0.0 < False出于正常"原因,而2+3j > False则用于类型基于名称的原因.

You can see, for instance, that the type-based ordering is used for comparing the complex number to 1, but not for comparing 1 and 0. Likewise 0.0 < False for "normal" reasons, but 2+3j > False for type-name-based reasons.

结果是它将1+0j排序到开头,然后将2+3j保留在False以上.它甚至从未尝试将两个复数彼此进行比较,并且将两者都与之比较的唯一项是1.

The result is that it sorts 1+0j to the beginning, and then leaves 2+3j where it is above False. It never even attempts to compare the two complex numbers to each other, and the only item they are both compared to is 1.

更一般而言,您的方法可能导致不及物动词的排序,并为所涉及类型的名称提供适当的选择.例如,如果您定义类A,B和C,以便可以比较A和C,但是与B相比它们会引发异常,则可以通过创建对象abc相应的类)(例如c < a),您可以创建一个循环a < b < c < a. a < b < c是正确的,因为将根据类的名称对它们进行比较,但是c < a因为可以直接比较这些类型.使用不及物动词的顺序,就不可能有正确的"排序顺序.

More generally, your approach can lead to an intransitive ordering with appropriate choices for the names of the types involved. For instance, if you define classes A, B, and C, such that A and C can be compared, but they raise exceptions when comparing to B, then by creating objects a, b and c (from the respective classes) such that c < a, you can create a cycle a < b < c < a. a < b < c will be true because the classes will be compared based on their names, but c < a since these types can be directly compared. With an intransitive ordering, there is no hope of a "correct" sorted order.

您甚至可以使用内置类型来执行此操作,尽管它需要一点创意才能想到类型名称按正确的字母顺序排列的对象:

You can even do this with builtin types, although it requires getting a little creative to think of objects whose type names are in the right alphabetical sequence:

>>> motley(1.0) < motley(lambda: 1) < motley(0) < motley(1.0)
True

(因为'float' < 'function':-)

这篇关于为什么这个用于对异类序列进行排序的键类的行为会奇怪?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆