Python递归生成器性能 [英] Python recursive generators performance
问题描述
在python中,当将纯递归函数更改为递归生成器(而不是普通生成器)时,性能似乎会下降.
In python, when changing a purely recursive function into a recursive generator (not a plain generator) the performance seems to be degrading.
例如,这是找到列表的所有组合的两个函数之间的性能比较:
For example, here is a performance comparison between two functions which find all combinations of a list:
from datetime import datetime as dt
def rec_subsets(ms, i=0, s=[]):
if i == len(ms):
# do something with s
return
rec_subsets(ms, i+1, s)
rec_subsets(ms, i+1, s + [ms[i]])
def gen_subsets(ms, i=0, s=[]):
if i == len(ms):
yield s
return
for a in gen_subsets(ms, i+1, s): yield a
for a in gen_subsets(ms, i+1, s + [ms[i]]): yield a
t1 = dt.now()
rec_subsets(range(20))
t2 = dt.now()
print t2 - t1
t1 = dt.now()
for _ in gen_subsets(range(20)): pass
t2 = dt.now()
print t2 - t1
具有以下输出:
0:00:01.027000 # rec_subsets
0:00:02.860000 # gen_subsets
人们自然希望 gen_subsets 与 rec_subsets 差不多,但是事实并非如此,它要慢得多.
One would naturally expect gen_subsets to be approximately as fast as rec_subsets but this is not the case, it is much slower.
这是正常现象还是我缺少什么?
Is this normal or am I missing something?
推荐答案
rec_subsets()
仍然更快(对于range(20)
),即使在# do something with s
处添加了result.append(s)
且gen_subsets()
和rec_subsets()
已消耗.
rec_subsets()
is still faster (for range(20)
) even if result.append(s)
is added inplace of # do something with s
and the results of both gen_subsets()
and rec_subsets()
are consumed.
It might be explained by the following quote from PEP 380 (yield from
syntax support):
使用专门的语法为优化开辟了可能性 当有一长串的发电机时.这样的链条可能会出现,因为 例如,当递归地遍历树结构时.开销 上下传递
__next__()
调用并产生值的过程 在最坏的情况下可能会导致本应成为 O(n)的操作变为 情况下, O(n ** 2).
Using a specialised syntax opens up possibilities for optimisation when there is a long chain of generators. Such chains can arise, for instance, when recursively traversing a tree structure. The overhead of passing
__next__()
calls and yielded values down and up the chain can cause what ought to be an O(n) operation to become, in the worst case, O(n**2).
您可以使用 itertools.combinations()
生成电源集:
You could generate a powerset using itertools.combinations()
:
from itertools import combinations
def subsets_comb(lst):
return (comb for r in range(len(lst)+1) for comb in combinations(lst, r))
在我的计算机上使用range(20)
的速度更快:
It is faster for range(20)
on my machine:
name time ratio comment
subsets_comb 227 msec 1.00 [range(0, 20)]
subsets_ipowerset 476 msec 2.10 [range(0, 20)]
subsets_rec 957 msec 4.22 [range(0, 20)]
subsets_gen_pep380 2.34 sec 10.29 [range(0, 20)]
subsets_gen 2.63 sec 11.59 [range(0, 20)]
要重现结果,请运行time-subsets.py
.
To reproduce the results, run time-subsets.py
.
这篇关于Python递归生成器性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!