为什么itertools.chain比平化列表理解要快? [英] Why is itertools.chain faster than a flattening list comprehension?
问题描述
在讨论的上下文中,这个问题提到,虽然将字符串序列连接起来只需要''.join([str1, str2, ...])
,但将列表序列连接起来就像list(itertools.chain(lst1, lst2, ...))
一样,尽管您也可以使用像[x for y in [lst1, lst2, ...] for x in y]
这样的列表理解.令我惊讶的是,第一种方法始终比第二种方法快:
In the context of a discussion in the comments of this question it was mentioned that while concatenating a sequence of strings simply takes ''.join([str1, str2, ...])
, concatenating a sequence of lists would be something like list(itertools.chain(lst1, lst2, ...))
, although you can also use a list comprehension like [x for y in [lst1, lst2, ...] for x in y]
. What surprised me is that the first method is consistently faster than the second:
import random
import itertools
random.seed(100)
lsts = [[1] * random.randint(100, 1000) for i in range(1000)]
%timeit [x for y in lsts for x in y]
# 39.3 ms ± 436 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit list(itertools.chain.from_iterable(lsts))
# 30.6 ms ± 866 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit list(x for y in lsts for x in y) # Proposed in comments
# 62.5 ms ± 504 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Loop-based methods proposed in the comments
%%timeit
a = []
for lst in lsts: a += lst
# 26.4 ms ± 634 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
a = []
for lst in lsts: a.extend(lst)
# 26.7 ms ± 728 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
这不是数量级的差异,但也不可忽略.我想知道怎么回事,因为列表理解通常是解决给定问题的最快方法之一.起初我以为itertools.chain
对象可能会有一个len
,而list
构造函数可以用来预分配必要的内存,但事实并非如此(不能在itertools.chain
对象上调用len
).是自定义itertools.chain
到list
的某种转换,还是itertools.chain
利用了其他机制?
It is not a difference of orders of magnitude, but it is not negligible either. I was wondering how that might be the case, since list comprehensions are frequently among the fastest methods to solve a given problem. At first I thought that maybe the itertools.chain
object would have a len
that the list
constructor could use to preallocate the necessary memory, but that is not the case (cannot call len
on itertools.chain
objects). Is some custom itertools.chain
-to-list
conversion taking place somehow or is itertools.chain
taking advantage of some other mechanism?
相关的话,在Windows 10 x64的Python 3.6.3中进行了测试.
Tested in Python 3.6.3 on Windows 10 x64, if that is relevant.
这似乎是最快的方法,毕竟是按照 @zwer ,可能是因为它可以处理数据的块",而不是基于每个元素.
It seems the fastest method after all is calling .extend
ing an empty list with each list, as proposed by @zwer, probably because it works on "chunks" of data instead of on a per-element basis.
推荐答案
此处为 itertools.chain.from_iterable .即使您不了解C也不难阅读,并且您可以说出一切都发生在c级别(在用于在代码中生成列表之前).
Here is itertools.chain.from_iterable. It's not hard to read even if you don't know C and you can tell everything is happening at the c level (before being used to generate a list in your code).
列表推导的字节码如下:
The bytecode for list comprehensions is like so:
def f(lsts):
return [x for y in lsts for x in y]
dis.dis(f.__code__.co_consts[1])
2 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 18 (to 24)
6 STORE_FAST 1 (y)
8 LOAD_FAST 1 (y)
10 GET_ITER
>> 12 FOR_ITER 8 (to 22)
14 STORE_FAST 2 (x)
16 LOAD_FAST 2 (x)
18 LIST_APPEND 3
20 JUMP_ABSOLUTE 12
>> 22 JUMP_ABSOLUTE 4
>> 24 RETURN_VALUE
这些是创建列表推导中涉及的所有python解释器操作.只需在C级别(在chain
中)进行所有操作,而不是在每个字节代码步骤(在理解中)上进行解释器步骤,便可以提高性能.
These are all the python interpreter operations involved in creating a list comprehension. Just having all the operations at the C level (in chain
) rather than having the interpreter step over each byte code step (in the comprehension) is what will give you that performance boost.
不过,这种提升是如此之小,我不会担心.这是python,可读性超过速度.
Still, that boost is so small I wouldn't worry about it. This is python, readability over speed.
获取列表包装的生成器理解
For a list wrapped generator comprehension
def g(lists):
return list(x for y in lsts for x in y)
# the comprehension
dis.dis(g.__code__.co_consts[1])
2 0 LOAD_FAST 0 (.0)
>> 2 FOR_ITER 20 (to 24)
4 STORE_FAST 1 (y)
6 LOAD_FAST 1 (y)
8 GET_ITER
>> 10 FOR_ITER 10 (to 22)
12 STORE_FAST 2 (x)
14 LOAD_FAST 2 (x)
16 YIELD_VALUE
18 POP_TOP
20 JUMP_ABSOLUTE 10
>> 22 JUMP_ABSOLUTE 2
>> 24 LOAD_CONST 0 (None)
26 RETURN_VALUE
因此,在运行按列表解压缩生成器表达式的过程中,解释器需要执行类似的步骤,但是正如您所期望的那样,使list
解压缩生成器的python级别开销(与C LIST_APPEND
指令)使它变慢.
So the interpreter has a similar number of steps to go to in running the generator expression being unpacked by list, but as you would expect, the python level overhead of having list
unpack a generator (as opposed to the C LIST_APPEND
instruction) is what slows it down.
dis.dis(f)
2 0 LOAD_CONST 1 (<code object <listcomp> at 0x000000000FB58B70, file "<ipython-input-33-1d46ced34d66>", line 2>)
2 LOAD_CONST 2 ('f.<locals>.<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_FAST 0 (lsts)
8 GET_ITER
10 CALL_FUNCTION 1
12 RETURN_VALUE
dis.dis(g)
2 0 LOAD_GLOBAL 0 (list)
2 LOAD_CONST 1 (<code object <genexpr> at 0x000000000FF6F420, file "<ipython-input-40-0334a7cdeb8f>", line 2>)
4 LOAD_CONST 2 ('g.<locals>.<genexpr>')
6 MAKE_FUNCTION 0
8 LOAD_GLOBAL 1 (lsts)
10 GET_ITER
12 CALL_FUNCTION 1
14 CALL_FUNCTION 1
16 RETURN_VALUE
这篇关于为什么itertools.chain比平化列表理解要快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!