为什么itertools.chain比平化列表理解要快? [英] Why is itertools.chain faster than a flattening list comprehension?

查看:152
本文介绍了为什么itertools.chain比平化列表理解要快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在讨论的上下文中,这个问题提到,虽然将字符串序列连接起来只需要''.join([str1, str2, ...]),但将列表序列连接起来就像list(itertools.chain(lst1, lst2, ...))一样,尽管您也可以使用像[x for y in [lst1, lst2, ...] for x in y]这样的列表理解.令我惊讶的是,第一种方法始终比第二种方法快:

In the context of a discussion in the comments of this question it was mentioned that while concatenating a sequence of strings simply takes ''.join([str1, str2, ...]), concatenating a sequence of lists would be something like list(itertools.chain(lst1, lst2, ...)), although you can also use a list comprehension like [x for y in [lst1, lst2, ...] for x in y]. What surprised me is that the first method is consistently faster than the second:

import random
import itertools

random.seed(100)
lsts = [[1] * random.randint(100, 1000) for i in range(1000)]

%timeit [x for y in lsts for x in y]
# 39.3 ms ± 436 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit list(itertools.chain.from_iterable(lsts))
# 30.6 ms ± 866 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit list(x for y in lsts for x in y)  # Proposed in comments
# 62.5 ms ± 504 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Loop-based methods proposed in the comments
%%timeit
a = []
for lst in lsts: a += lst
# 26.4 ms ± 634 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
a = []
for lst in lsts: a.extend(lst)
# 26.7 ms ± 728 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

这不是数量级的差异,但也不可忽略.我想知道怎么回事,因为列表理解通常是解决给定问题的最快方法之一.起初我以为itertools.chain对象可能会有一个len,而list构造函数可以用来预分配必要的内存,但事实并非如此(不能在itertools.chain对象上调用len).是自定义itertools.chainlist的某种转换,还是itertools.chain利用了其他机制?

It is not a difference of orders of magnitude, but it is not negligible either. I was wondering how that might be the case, since list comprehensions are frequently among the fastest methods to solve a given problem. At first I thought that maybe the itertools.chain object would have a len that the list constructor could use to preallocate the necessary memory, but that is not the case (cannot call len on itertools.chain objects). Is some custom itertools.chain-to-list conversion taking place somehow or is itertools.chain taking advantage of some other mechanism?

相关的话,在Windows 10 x64的Python 3.6.3中进行了测试.

Tested in Python 3.6.3 on Windows 10 x64, if that is relevant.

这似乎是最快的方法,毕竟是按照 @zwer ,可能是因为它可以处理数据的块",而不是基于每个元素.

It seems the fastest method after all is calling .extending an empty list with each list, as proposed by @zwer, probably because it works on "chunks" of data instead of on a per-element basis.

推荐答案

此处为 itertools.chain.from_iterable .即使您不了解C也不难阅读,并且您可以说出一切都发生在c级别(在用于在代码中生成列表之前).

Here is itertools.chain.from_iterable. It's not hard to read even if you don't know C and you can tell everything is happening at the c level (before being used to generate a list in your code).

列表推导的字节码如下:

The bytecode for list comprehensions is like so:

def f(lsts):
    return [x for y in lsts for x in y]

dis.dis(f.__code__.co_consts[1])
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                18 (to 24)
              6 STORE_FAST               1 (y)
              8 LOAD_FAST                1 (y)
             10 GET_ITER
        >>   12 FOR_ITER                 8 (to 22)
             14 STORE_FAST               2 (x)
             16 LOAD_FAST                2 (x)
             18 LIST_APPEND              3
             20 JUMP_ABSOLUTE           12
        >>   22 JUMP_ABSOLUTE            4
        >>   24 RETURN_VALUE

这些是创建列表推导中涉及的所有python解释器操作.只需在C级别(在chain中)进行所有操作,而不是在每个字节代码步骤(在理解中)上进行解释器步骤,便可以提高性能.

These are all the python interpreter operations involved in creating a list comprehension. Just having all the operations at the C level (in chain) rather than having the interpreter step over each byte code step (in the comprehension) is what will give you that performance boost.

不过,这种提升是如此之小,我不会担心.这是python,可读性超过速度.

Still, that boost is so small I wouldn't worry about it. This is python, readability over speed.

获取列表包装的生成器理解

For a list wrapped generator comprehension

def g(lists):
    return list(x for y in lsts for x in y)

# the comprehension
dis.dis(g.__code__.co_consts[1])
  2           0 LOAD_FAST                0 (.0)
        >>    2 FOR_ITER                20 (to 24)
              4 STORE_FAST               1 (y)
              6 LOAD_FAST                1 (y)
              8 GET_ITER
        >>   10 FOR_ITER                10 (to 22)
             12 STORE_FAST               2 (x)
             14 LOAD_FAST                2 (x)
             16 YIELD_VALUE
             18 POP_TOP
             20 JUMP_ABSOLUTE           10
        >>   22 JUMP_ABSOLUTE            2
        >>   24 LOAD_CONST               0 (None)
             26 RETURN_VALUE

因此,在运行按列表解压缩生成器表达式的过程中,解释器需要执行类似的步骤,但是正如您所期望的那样,使list解压缩生成器的python级别开销(与C LIST_APPEND指令)使它变慢.

So the interpreter has a similar number of steps to go to in running the generator expression being unpacked by list, but as you would expect, the python level overhead of having list unpack a generator (as opposed to the C LIST_APPEND instruction) is what slows it down.

dis.dis(f)
  2           0 LOAD_CONST               1 (<code object <listcomp> at 0x000000000FB58B70, file "<ipython-input-33-1d46ced34d66>", line 2>)
              2 LOAD_CONST               2 ('f.<locals>.<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_FAST                0 (lsts)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

dis.dis(g)
  2           0 LOAD_GLOBAL              0 (list)
              2 LOAD_CONST               1 (<code object <genexpr> at 0x000000000FF6F420, file "<ipython-input-40-0334a7cdeb8f>", line 2>)
              4 LOAD_CONST               2 ('g.<locals>.<genexpr>')
              6 MAKE_FUNCTION            0
              8 LOAD_GLOBAL              1 (lsts)
             10 GET_ITER
             12 CALL_FUNCTION            1
             14 CALL_FUNCTION            1
             16 RETURN_VALUE

这篇关于为什么itertools.chain比平化列表理解要快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆