生成器理解表达式之间的差异 [英] Differences between generator comprehension expressions

查看:58
本文介绍了生成器理解表达式之间的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我所知,有三种通过理解 1 创建生成器的方法.

There are, as far as I know, three ways to create a generator through a comprehension1.

经典的:

def f1():
    g = (i for i in range(10))

yield变体:

def f2():
    g = [(yield i) for i in range(10)]

yield from变体(除了函数内部,它引发SyntaxError):

The yield from variant (that raises a SyntaxError except inside of a function):

def f3():
    g = [(yield from range(10))]

这三个变体导致不同的字节码,这并不奇怪. 似乎第一个是最好的,这是合乎逻辑的,因为它是一种通过理解来创建生成器的专用,直接的语法. 但是,它不是产生最短字节码的代码.

The three variants lead to different bytecode, which is not really surprising. It would seem logical that the first one is the best, since it's a dedicated, straightforward syntax to create a generator through comprehension. However, it is not the one that produces the shortest bytecode.

在Python 3.6中反汇编

经典生成器理解

>>> dis.dis(f1)
4           0 LOAD_CONST               1 (<code object <genexpr> at...>)
            2 LOAD_CONST               2 ('f1.<locals>.<genexpr>')
            4 MAKE_FUNCTION            0
            6 LOAD_GLOBAL              0 (range)
            8 LOAD_CONST               3 (10)
           10 CALL_FUNCTION            1
           12 GET_ITER
           14 CALL_FUNCTION            1
           16 STORE_FAST               0 (g)

5          18 LOAD_FAST                0 (g)
           20 RETURN_VALUE

yield变体

yield variant

>>> dis.dis(f2)
8           0 LOAD_CONST               1 (<code object <listcomp> at...>)
            2 LOAD_CONST               2 ('f2.<locals>.<listcomp>')
            4 MAKE_FUNCTION            0
            6 LOAD_GLOBAL              0 (range)
            8 LOAD_CONST               3 (10)
           10 CALL_FUNCTION            1
           12 GET_ITER
           14 CALL_FUNCTION            1
           16 STORE_FAST               0 (g)

9          18 LOAD_FAST                0 (g)
           20 RETURN_VALUE

yield from变体

yield from variant

>>> dis.dis(f3)
12           0 LOAD_GLOBAL              0 (range)
             2 LOAD_CONST               1 (10)
             4 CALL_FUNCTION            1
             6 GET_YIELD_FROM_ITER
             8 LOAD_CONST               0 (None)
            10 YIELD_FROM
            12 BUILD_LIST               1
            14 STORE_FAST               0 (g)

13          16 LOAD_FAST                0 (g)
            18 RETURN_VALUE
        


此外,timeit的比较显示yield from变体最快(仍在Python 3.6上运行):


In addition, a timeit comparison shows that the yield from variant is the fastest (still run with Python 3.6):

>>> timeit(f1)
0.5334039637357152

>>> timeit(f2)
0.5358906506760719

>>> timeit(f3)
0.19329123352712596

f3大约是f1f2的2.7倍.

f3 is more or less 2.7 times as fast as f1 and f2.

正如 Leon 在评论中提到的那样,发电机的效率最好通过其迭代速度来衡量. 因此,我更改了这三个函数,以使它们遍历生成器,并调用虚拟函数.

As Leon mentioned in a comment, the efficiency of a generator is best measured by the speed it can be iterated over. So I changed the three functions so they iterate over the generators, and call a dummy function.

def f():
    pass

def fn():
    g = ...
    for _ in g:
        f()

结果更加明显:

>>> timeit(f1)
1.6017412817975778

>>> timeit(f2)
1.778684261368946

>>> timeit(f3)
0.1960603619517669

f3现在的速度是f1的8.4倍,是f2的9.3倍.

f3 is now 8.4 times as fast as f1, and 9.3 times as fast as f2.

注意:当迭代器不是range(10)而是静态迭代器(例如[0, 1, 2, 3, 4, 5])时,结果大致相同. 因此,速度的差异与range的优化无关.

Note: The results are more or less the same when the iterable is not range(10) but a static iterable, such as [0, 1, 2, 3, 4, 5]. Therefore, the difference of speed has nothing to do with range being somehow optimized.

那么,这三种方式之间有什么区别? 更具体地说,yield from变体与其他两个变体之间有什么区别?

So, what are the differences between the three ways? More specifically, what is the difference between the yield from variant and the two other?

这种正常行为是自然构造(elt for elt in it)比棘手的[(yield from it)]慢吗? 从现在起我应该在所有脚本中用前者替换后者,还是使用yield from构造有任何缺点?

Is this normal behaviour that the natural construct (elt for elt in it) is slower than the tricky [(yield from it)]? Shall I from now on replace the former by the latter in all of my scripts, or is there any drawbacks to using the yield from construct?

这都是相关的,所以我不想打开一个新问题,但这变得越来越陌生. 我尝试比较range(10)[(yield from range(10))].

This is all related, so I don't feel like opening a new question, but this is getting even stranger. I tried comparing range(10) and [(yield from range(10))].

def f1():
    for i in range(10):
        print(i)
    
def f2():
    for i in [(yield from range(10))]:
        print(i)

>>> timeit(f1, number=100000)
26.715589237537195

>>> timeit(f2, number=100000)
0.019948781941049987

所以.现在,在[(yield from range(10))]上进行迭代的速度是在裸露的range(10)上进行迭代的速度的186倍?

So. Now, iterating over [(yield from range(10))] is 186 times as fast as iterating over a bare range(10)?

您如何解释为什么遍历[(yield from range(10))]比遍历range(10)这么快?

How do you explain why iterating over [(yield from range(10))] is so much faster than iterating over range(10)?

1:出于怀疑,后面的三个表达式的确会产生一个generator对象.尝试对它们调用type.

1: For the sceptical, the three expressions that follow do produce a generator object; try and call type on them.

推荐答案

这是您应该做的:

g = (i for i in range(10))

这是一个生成器表达式.等同于

It's a generator expression. It's equivalent to

def temp(outer):
    for i in outer:
        yield i
g = temp(range(10))

但是如果您只想对range(10)的元素进行迭代,则可以完成

but if you just wanted an iterable with the elements of range(10), you could have done

g = range(10)

您无需将任何此类包装在函数中.

You do not need to wrap any of this in a function.

如果您在这里要学习编写什么代码,则可以停止阅读.这篇文章的其余部分是对为什么其他代码段被破坏并且不应该使用的详尽的技术性解释,其中包括对为什么您的时序也被破坏的解释.

If you're here to learn what code to write, you can stop reading. The rest of this post is a long and technical explanation of why the other code snippets are broken and should not be used, including an explanation of why your timings are broken too.

此:

g = [(yield i) for i in range(10)]

是本应在几年前删除的损坏的构造.在最初报告该问题 8年后,删除该问题的过程是

is a broken construct that should have been taken out years ago. 8 years after the problem was originally reported, the process to remove it is finally beginning. Don't do it.

虽然仍然是该语言,但在Python 3上,它等同于

While it's still in the language, on Python 3, it's equivalent to

def temp(outer):
    l = []
    for i in outer:
        l.append((yield i))
    return l
g = temp(range(10))

列表推导应该返回列表,但是由于yield的原因,它不会返回列表.它的作用类似于生成器表达式,它产生的内容与您的第一个代码片段相同,但是它构建了一个不必要的列表,并将其附加到最后出现的StopIteration上.

List comprehensions are supposed to return lists, but because of the yield, this one doesn't. It acts kind of like a generator expression, and it yields the same things as your first snippet, but it builds an unnecessary list and attaches it to the StopIteration raised at the end.

>>> g = [(yield i) for i in range(10)]
>>> [next(g) for i in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> next(g)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: [None, None, None, None, None, None, None, None, None, None]

这令人困惑并且浪费内存.不要这样(如果您想知道所有None的来源,请阅读 PEP 342 .)

This is confusing and a waste of memory. Don't do it. (If you want to know where all those Nones are coming from, read PEP 342.)

在Python 2上,g = [(yield i) for i in range(10)]的功能完全不同. Python 2不会赋予列表推导它们自己的范围-特别是列表推导,而不是字典或集合推导-因此yield由包含此行的任何函数执行.在Python 2上:

On Python 2, g = [(yield i) for i in range(10)] does something entirely different. Python 2 doesn't give list comprehensions their own scope - specifically list comprehensions, not dict or set comprehensions - so the yield is executed by whatever function contains this line. On Python 2, this:

def f():
    g = [(yield i) for i in range(10)]

等同于

def f():
    temp = []
    for i in range(10):
        temp.append((yield i))
    g = temp

预异步的意义上,将f生成基于生成器的协程.再说一次,如果您的目标是获得一个生成器,那么您就浪费了很多时间来建立一个毫无意义的列表.

making f a generator-based coroutine, in the pre-async sense. Again, if your goal was to get a generator, you've wasted a bunch of time building a pointless list.

此:

g = [(yield from range(10))]

很愚蠢,但是这次没有怪应该归咎于Python.

is silly, but none of the blame is on Python this time.

这里根本没有理解力或genexp.方括号不是列表理解;所有工作都由yield from完成,然后构建一个包含(cless)返回值yield from的1元素列表.您的f3:

There is no comprehension or genexp here at all. The brackets are not a list comprehension; all the work is done by yield from, and then you build a 1-element list containing the (useless) return value of yield from. Your f3:

def f3():
    g = [(yield from range(10))]

当删除不必要的列表时,简化为

when stripped of the unnecessary list-building, simplifies to

def f3():
    yield from range(10)

或者,忽略所有协程支持东西yield from所做的事情,

or, ignoring all the coroutine support stuff yield from does,

def f3():
    for i in range(10):
        yield i


您的时间安排也被打破了.


Your timings are also broken.

在第一次使用时,尽管f2的生成器很奇怪,但f1f2创建了可在这些函数内使用的生成器对象. f3不会那样做; f3 一个生成器函数. f3的主体不在您的时间范围内运行,如果已运行,则其g的行为将与其他函数的g完全不同.实际上可以与f1f2相媲美的时间是

In your first timing, f1 and f2 create generator objects that can be used inside those functions, though f2's generator is weird. f3 doesn't do that; f3 is a generator function. f3's body does not run in your timings, and if it did, its g would behave quite unlike the other functions' gs. A timing that would actually be comparable with f1 and f2 would be

def f4():
    g = f3()

在第二次计时中,f2实际上没有运行,出于相同的原因,f3在先前的计时中已损坏.在第二次计时中,f2不在迭代生成器.相反,yield from会将f2本身转换为生成器函数.

In your second timing, f2 doesn't actually run, for the same reason f3 was broken in the previous timing. In your second timing, f2 is not iterating over a generator. Instead, the yield from turns f2 into a generator function itself.

这篇关于生成器理解表达式之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆