使用条件生成器表达式的意外行为 [英] Unexpected behaviour with a conditional generator expression

查看:78
本文介绍了使用条件生成器表达式的意外行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一段代码,该代码在程序的某个部分意外出现逻辑错误.在调查本节时,我创建了一个测试文件来测试正在运行的语句集,并发现了一个看起来很奇怪的异常错误.

I was running a piece of code that unexpectedly gave a logic error at one part of the program. When investigating the section, I created a test file to test the set of statements being run and found out an unusual bug that seems very odd.

我测试了以下简单代码:

I tested this simple code:

array = [1, 2, 2, 4, 5] # Original array
f = (x for x in array if array.count(x) == 2) # Filters original
array = [5, 6, 1, 2, 9] # Updates original to something else

print(list(f)) # Outputs filtered

输出为:

>>> []

是的,什么都没有.我期望过滤器理解能得到2中的项并输出,但是我没有得到:

Yes, nothing. I was expecting the filter comprehension to get items in the array with a count of 2 and output this, but I didn't get that:

# Expected output
>>> [2, 2]

当我注释掉第三行以再次对其进行测试时:

When I commented out the third line to test it once again:

array = [1, 2, 2, 4, 5] # Original array
f = (x for x in array if array.count(x) == 2) # Filters original
### array = [5, 6, 1, 2, 9] # Ignore line

print(list(f)) # Outputs filtered

输出正确(您可以自己测试):

The output was correct (you can test it for yourself):

>>> [2, 2]

有一次我输出了变量f的类型:

At one point I outputted the type of the variable f:

array = [1, 2, 2, 4, 5] # Original array
f = (x for x in array if array.count(x) == 2) # Filters original
array = [5, 6, 1, 2, 9] # Updates original

print(type(f))
print(list(f)) # Outputs filtered

然后我得到了

>>> <class 'generator'>
>>> []

为什么在Python中更新列表会更改另一个生成器变量的输出?对我来说这很奇怪.

Why is updating a list in Python changing the output of another generator variable? This seems very odd to me.

推荐答案

Python的生成器表达式后期绑定(请参见 PEP 289-生成器表达式)(其他答案称为懒惰"):

Python's generator expressions are late binding (see PEP 289 -- Generator Expressions) (what the other answers call "lazy"):

早期绑定与后期绑定

经过大量讨论,决定应该立即评估[生成器表达式]的第一个(最外部)表达式,并在执行生成器时评估其余的表达式.

Early Binding versus Late Binding

After much discussion, it was decided that the first (outermost) for-expression [of the generator expression] should be evaluated immediately and that the remaining expressions be evaluated when the generator is executed.

[...] Python对lambda表达式采用后期绑定方法,并且没有自动早期绑定的先例.人们认为引入新的范式会不必要地引入复杂性.

[...] Python takes a late binding approach to lambda expressions and has no precedent for automatic, early binding. It was felt that introducing a new paradigm would unnecessarily introduce complexity.

在探索了许多可能性之后,出现了一个共识,即绑定问题难以理解,应强烈鼓励用户在函数中使用生成器表达式,这些函数立即使用其参数.对于更复杂的应用程序,完整的生成器定义在范围,生存期和绑定方面显而易见,因此始终是优越的.

After exploring many possibilities, a consensus emerged that binding issues were hard to understand and that users should be strongly encouraged to use generator expressions inside functions that consume their arguments immediately. For more complex applications, full generator definitions are always superior in terms of being obvious about scope, lifetime, and binding.

这意味着它会在创建生成器表达式时评估最外面的for.因此,它实际上将"subexpression"中的名称为array的值绑定. in array(实际上,它此时已将等价物绑定到iter(array)).但是,当您遍历生成器时,if array.count调用实际上是指当前名为array的内容.

That means it only evaluates the outermost for when creating the generator expression. So it actually binds the value with the name array in the "subexpression" in array (in fact it's binding the equivalent to iter(array) at this point). But when you iterate over the generator the if array.count call actually refers to what is currently named array.

由于它实际上是list而不是array,因此我将答案的其余部分中的变量名称更改为更准确.

Since it's actually a list not an array I changed the variable names in the rest of the answer to be more accurate.

在第一种情况下,您迭代的list和您所计数的list将是不同的.就像您曾经使用过:

In your first case the list you iterate over and the list you count in will be different. It's as if you used:

list1 = [1, 2, 2, 4, 5]
list2 = [5, 6, 1, 2, 9]
f = (x for x in list1 if list2.count(x) == 2)

因此,您检查list1中的每个元素是否在list2中计数为两个.

So you check for each element in list1 if its count in list2 is two.

您可以通过修改第二个列表来轻松地验证这一点:

You can easily verify this by modifying the second list:

>>> lst = [1, 2, 2]
>>> f = (x for x in lst if lst.count(x) == 2)
>>> lst = [1, 1, 2]
>>> list(f)
[1]

如果遍历第一个列表并计入第一个列表,它将返回[2, 2](因为第一个列表包含两个2).如果迭代并计入第二个列表,则输出应为[1, 1].但是,由于迭代了第一个列表(包含一个1),但是检查了第二个列表(包含两个1),因此输出只是一个单独的1.

If it iterated over the first list and counted in the first list it would've returned [2, 2] (because the first list contains two 2). If it iterated over and counted in the second list the output should be [1, 1]. But since it iterates over the first list (containing one 1) but checks the second list (which contains two 1s) the output is just a single 1.

有几种可能的解决方案,我通常不希望使用生成器表达式"来代替.如果不立即进行迭代.一个简单的生成器函数足以使其正常工作:

There are several possible solutions, I generally prefer not to use "generator expressions" if they aren't iterated over immediately. A simple generator function will suffice to make it work correctly:

def keep_only_duplicated_items(lst):
    for item in lst:
        if lst.count(item) == 2:
            yield item

然后像这样使用它:

lst = [1, 2, 2, 4, 5]
f = keep_only_duplicated_items(lst)
lst = [5, 6, 1, 2, 9]

>>> list(f)
[2, 2]

请注意,PEP(请参阅上面的链接)还指出,对于更复杂的事情,最好使用完整的生成器定义.

Note that the PEP (see the link above) also states that for anything more complicated a full generator definition is preferrable.

更好的解决方案(避免二次运行时的行为,因为您要遍历整个数组中的每个元素)将计数(

A better solution (avoiding the quadratic runtime behavior because you iterate over the whole array for each element in the array) would be to count (collections.Counter) the elements once and then do the lookup in constant time (resulting in linear time):

from collections import Counter

def keep_only_duplicated_items(lst):
    cnts = Counter(lst)
    for item in lst:
        if cnts[item] == 2:
            yield item

附录:使用子类来可视化"对象.会发生什么,何时发生

创建一个list子类非常容易,该子类在调用特定方法时将进行打印,因此可以验证它确实可以那样工作.

Appendix: Using a subclass to "visualize" what happens and when it happens

It's quite easy to create a list subclass that prints when specific methods are called, so one can verify that it really works like that.

在这种情况下,我只覆盖方法__iter__count,因为我对生成器表达式迭代哪个列表以及在哪个列表中计数感兴趣.方法主体实际上只是委托给超类并打印一些内容(因为它使用不带参数和f字符串的super,因此它需要Python 3.6,但应该易于适应其他Python版本):

In this case I just override the methods __iter__ and count because I'm interested over which list the generator expression iterates and in which list it counts. The method bodies actually just delegate to the superclass and print something (since it uses super without arguments and f-strings it requires Python 3.6 but it should be easy to adapt for other Python versions):

class MyList(list):
    def __iter__(self):
        print(f'__iter__() called on {self!r}')
        return super().__iter__()
        
    def count(self, item):
        cnt = super().count(item)
        print(f'count({item!r}) called on {self!r}, result: {cnt}')
        return cnt

这是一个简单的子类,仅在调用__iter__count方法时进行打印:

This is a simple subclass just printing when the __iter__ and count method are called:

>>> lst = MyList([1, 2, 2, 4, 5])

>>> f = (x for x in lst if lst.count(x) == 2)
__iter__() called on [1, 2, 2, 4, 5]

>>> lst = MyList([5, 6, 1, 2, 9])

>>> print(list(f))
count(1) called on [5, 6, 1, 2, 9], result: 1
count(2) called on [5, 6, 1, 2, 9], result: 1
count(2) called on [5, 6, 1, 2, 9], result: 1
count(4) called on [5, 6, 1, 2, 9], result: 0
count(5) called on [5, 6, 1, 2, 9], result: 1
[]

这篇关于使用条件生成器表达式的意外行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆