在“if"子句中使用“in"时的元组或列表? [英] Tuple or list when using 'in' in an 'if' clause?
问题描述
哪种方法更好?使用元组,例如:
如果 (1, 2) 中的数字:
或者一个列表,比如:
如果 [1, 2] 中的数字:
推荐哪一种用于此类用途,为什么(逻辑和性能方面)?
CPython 解释器用第一种形式替换第二种形式.
那是因为从常量加载元组是一个操作,但列表将是 3 个操作;加载两个整数内容并构建一个新的列表对象.
因为您使用的是无法以其他方式访问的列表文字,所以它被替换为一个元组:
<预><代码>>>>导入文件>>>dis.dis(compile('number in [1, 2]', '<stdin>', 'eval'))1 0 LOAD_NAME 0(数字)3 LOAD_CONST 2 ((1, 2))6 COMPARE_OP 6(英寸)9 RETURN_VALUE这里的第二个字节码在 one 步骤中加载一个 (1, 2)
元组作为常量.将此与创建成员资格测试中未使用的列表对象进行比较:
这里对于长度为 N 的列表对象需要 N+1 步.
这个替换是一个 CPython 特定的窥孔优化;请参阅 Python/peephole.c
源代码.对于其他 Python 实现,您希望坚持使用不可变对象.
也就是说,在使用 Python 3.2 及更高版本时,最佳选项是使用 set 文字:
如果 {1, 2} 中的数字:
因为窥孔优化器将用 frozenset()
对象替换它,并且对集合的成员资格测试是 O(1) 常量操作:
此优化已在 Python 3.2 中添加,但未向后移植到 Python 2.
因此,Python 2 优化器无法识别此选项,并且从内容构建 set
或 frozenset
的成本几乎肯定会更高而不是使用元组进行测试.
集合成员测试是 O(1) 并且很快;对元组的测试是 O(n) 最坏的情况.尽管针对集合的测试必须计算散列(更高的常量成本,为不可变类型缓存),但针对元组的测试除第一个元素之外的成本总是会更高.因此,平均而言,集合很容易更快:
<预><代码>>>>导入时间>>>timeit.timeit('1 in (1, 3, 5)', number=10**7) # 元组的最佳情况0.21154764899984002>>>timeit.timeit('8 in (1, 3, 5)', number=10**7) # 元组的最坏情况0.5670104179880582>>>timeit.timeit('1 in {1, 3, 5}', number=10**7) # 集合的平均情况0.2663505630043801>>>timeit.timeit('8 in {1, 3, 5}', number=10**7) # 集合的最坏情况0.25939063701662235Which approach is better? Using a tuple, like:
if number in (1, 2):
or a list, like:
if number in [1, 2]:
Which one is recommended for such uses and why (both logical and performance wise)?
The CPython interpreter replaces the second form with the first.
That's because loading the tuple from a constant is one operation, but the list would be 3 operations; load the two integer contents and build a new list object.
Because you are using a list literal that isn't otherwise reachable, it is substituted for a tuple:
>>> import dis
>>> dis.dis(compile('number in [1, 2]', '<stdin>', 'eval'))
1 0 LOAD_NAME 0 (number)
3 LOAD_CONST 2 ((1, 2))
6 COMPARE_OP 6 (in)
9 RETURN_VALUE
Here the second bytecode loads a (1, 2)
tuple as a constant, in one step. Compare this to creating a list object not used in a membership test:
>>> dis.dis(compile('[1, 2]', '<stdin>', 'eval'))
1 0 LOAD_CONST 0 (1)
3 LOAD_CONST 1 (2)
6 BUILD_LIST 2
9 RETURN_VALUE
Here N+1 steps are required for a list object of length N.
This substitution is a CPython-specific peephole optimisation; see the Python/peephole.c
source. For other Python implementations then, you want to stick with immutable objects instead.
That said, the best option when using Python 3.2 and up, is to use a set literal:
if number in {1, 2}:
as the peephole optimiser will replace that with a frozenset()
object and membership tests against sets are O(1) constant operations:
>>> dis.dis(compile('number in {1, 2}', '<stdin>', 'eval'))
1 0 LOAD_NAME 0 (number)
3 LOAD_CONST 2 (frozenset({1, 2}))
6 COMPARE_OP 6 (in)
9 RETURN_VALUE
This optimization was added in Python 3.2 but wasn't backported to Python 2.
As such, the Python 2 optimiser doesn't recognize this option and the cost of building either a set
or frozenset
from the contents is almost guaranteed to be more costly than using a tuple for the test.
Set membership tests are O(1) and fast; testing against a tuple is O(n) worst case. Although testing against a set has to calculate the hash (higher constant cost, cached for immutable types), the cost for testing against a tuple other than the first element is always going to be higher. So on average, sets are easily faster:
>>> import timeit
>>> timeit.timeit('1 in (1, 3, 5)', number=10**7) # best-case for tuples
0.21154764899984002
>>> timeit.timeit('8 in (1, 3, 5)', number=10**7) # worst-case for tuples
0.5670104179880582
>>> timeit.timeit('1 in {1, 3, 5}', number=10**7) # average-case for sets
0.2663505630043801
>>> timeit.timeit('8 in {1, 3, 5}', number=10**7) # worst-case for sets
0.25939063701662235
这篇关于在“if"子句中使用“in"时的元组或列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!