设置与冻结设置性能 [英] Set vs. frozenset performance
问题描述
我正在修改Python的set
和frozenset
集合类型.
I was tinkering around with Python's set
and frozenset
collection types.
最初,我认为frozenset
将提供比set
更好的查找性能,因为它是不可变的,因此可以利用存储项目的结构.
Initially, I assumed that frozenset
would provide a better lookup performance than set
, as its immutable and thus could exploit the structure of the stored items.
但是,关于以下实验,情况似乎并非如此:
However, this does not seem to be the case, regarding the following experiment:
import random
import time
import sys
def main(n):
numbers = []
for _ in xrange(n):
numbers.append(random.randint(0, sys.maxint))
set_ = set(numbers)
frozenset_ = frozenset(set_)
start = time.time()
for number in numbers:
number in set_
set_duration = time.time() - start
start = time.time()
for number in numbers:
number in frozenset_
frozenset_duration = time.time() - start
print "set : %.3f" % set_duration
print "frozenset: %.3f" % frozenset_duration
if __name__ == "__main__":
n = int(sys.argv[1])
main(n)
我同时使用CPython和PyPy执行了这段代码,结果如下:
I executed this code using both CPython and PyPy, which gave the following results:
> pypy set.py 100000000
set : 6.156
frozenset: 6.166
> python set.py 100000000
set : 16.824
frozenset: 17.248
在CPython和PyPy中,frozenset
的查找性能实际上实际上都较慢.有人知道为什么会这样吗?我没有研究这些实现.
It seems that frozenset
is actually slower regarding the lookup performance, both in CPython and in PyPy. Does anybody have an idea why this is the case? I did not look into the implementations.
推荐答案
frozenset
和set
实现在很大程度上是共享的. set
只是添加了变异方法的frozenset
,具有完全相同的哈希表实现.请参见 Objects/setobject.c
源文件;顶级 PyFrozenSet_Type
定义与
The frozenset
and set
implementations are largely shared; a set
is simply a frozenset
with mutating methods added, with the exact same hashtable implementation. See the Objects/setobject.c
source file; the top-level PyFrozenSet_Type
definition shares functions with the PySet_Type
definition.
这里没有针对冻结集的优化,因为在测试成员资格时无需计算
There is no optimisation for a frozenset here, as there is no need to calculate the hashes for the items in the frozenset
when you are testing for membership. The item that you use to test against the set still needs to have their hash calculated, in order to find the right slot in the set hashtable so you can do an equality test.
因此,您的计时结果可能由于系统上运行的其他进程而关闭;您测量了时钟时间,没有禁用Python垃圾收集,也没有重复测试同一件事.
As such, your timing results are probably off due to other processes running on your system; you measured wall-clock time, and did not disable Python garbage collection nor did you repeatedly test the same thing.
尝试使用 timeit
模块(其中一个值来自numbers
)运行测试还有一个不在集合中:
Try to run your test using the timeit
module, with one value from numbers
and one not in the set:
import random
import sys
import timeit
numbers = [random.randrange(sys.maxsize) for _ in range(10000)]
set_ = set(numbers)
fset = frozenset(numbers)
present = random.choice(numbers)
notpresent = -1
test = 'present in s; notpresent in s'
settime = timeit.timeit(
test,
'from __main__ import set_ as s, present, notpresent')
fsettime = timeit.timeit(
test,
'from __main__ import fset as s, present, notpresent')
print('set : {:.3f} seconds'.format(settime))
print('frozenset: {:.3f} seconds'.format(fsettime))
这将每个测试重复一百万次,并产生:
This repeats each test 1 million times and produces:
set : 0.050 seconds
frozenset: 0.050 seconds
这篇关于设置与冻结设置性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!