在“if"子句中使用“in"时的元组或列表? [英] Tuple or list when using 'in' in an 'if' clause?

查看:20
本文介绍了在“if"子句中使用“in"时的元组或列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

哪种方法更好?使用元组,例如:

如果 (1, 2) 中的数字:

或者一个列表,比如:

如果 [1, 2] 中的数字:

推荐哪一种用于此类用途,为什么(逻辑和性能方面)?

解决方案

CPython 解释器用第一种形式替换第二种形式.

那是因为从常量加载元组是一个操作,但列表将是 3 个操作;加载两个整数内容并构建一个新的列表对象.

因为您使用的是无法以其他方式访问的列表文字,所以它被替换为一个元组:

<预><代码>>>>导入文件>>>dis.dis(compile('number in [1, 2]', '<stdin>', 'eval'))1 0 LOAD_NAME 0(数字)3 LOAD_CONST 2 ((1, 2))6 COMPARE_OP 6(英寸)9 RETURN_VALUE

这里的第二个字节码在 one 步骤中加载一个 (1, 2) 元组作为常量.将此与创建成员资格测试中未使用的列表对象进行比较:

<预><代码>>>>dis.dis(compile('[1, 2]', '', 'eval'))1 0 LOAD_CONST 0 (1)3 LOAD_CONST 1 (2)6 BUILD_LIST 29 RETURN_VALUE

这里对于长度为 N 的列表对象需要 N+1 步.

这个替换是一个 CPython 特定的窥孔优化;请参阅 Python/peephole.c 源代码.对于其他 Python 实现,您希望坚持使用不可变对象.

也就是说,在使用 Python 3.2 及更高版本时,最佳选项是使用 set 文字:

如果 {1, 2} 中的数字:

因为窥孔优化器将用 frozenset() 对象替换它,并且对集合的成员资格测试是 O(1) 常量操作:

<预><代码>>>>dis.dis(compile('number in {1, 2}', '<stdin>', 'eval'))1 0 LOAD_NAME 0(数字)3 LOAD_CONST 2 (冻结集({1, 2}))6 COMPARE_OP 6(英寸)9 RETURN_VALUE

此优化已在 Python 3.2 中添加,但未向后移植到 Python 2.

因此,Python 2 优化器无法识别此选项,并且从内容构建 setfrozenset 的成本几乎肯定会更高而不是使用元组进行测试.

集合成员测试是 O(1) 并且很快;对元组的测试是 O(n) 最坏的情况.尽管针对集合的测试必须计算散列(更高的常量成本,为不可变类型缓存),但针对元组的测试除第一个元素之外的成本总是会更高.因此,平均而言,集合很容易更快:

<预><代码>>>>导入时间>>>timeit.timeit('1 in (1, 3, 5)', number=10**7) # 元组的最佳情况0.21154764899984002>>>timeit.timeit('8 in (1, 3, 5)', number=10**7) # 元组的最坏情况0.5670104179880582>>>timeit.timeit('1 in {1, 3, 5}', number=10**7) # 集合的平均情况0.2663505630043801>>>timeit.timeit('8 in {1, 3, 5}', number=10**7) # 集合的最坏情况0.25939063701662235

Which approach is better? Using a tuple, like:

if number in (1, 2):

or a list, like:

if number in [1, 2]:

Which one is recommended for such uses and why (both logical and performance wise)?

解决方案

The CPython interpreter replaces the second form with the first.

That's because loading the tuple from a constant is one operation, but the list would be 3 operations; load the two integer contents and build a new list object.

Because you are using a list literal that isn't otherwise reachable, it is substituted for a tuple:

>>> import dis
>>> dis.dis(compile('number in [1, 2]', '<stdin>', 'eval'))
  1           0 LOAD_NAME                0 (number)
              3 LOAD_CONST               2 ((1, 2))
              6 COMPARE_OP               6 (in)
              9 RETURN_VALUE        

Here the second bytecode loads a (1, 2) tuple as a constant, in one step. Compare this to creating a list object not used in a membership test:

>>> dis.dis(compile('[1, 2]', '<stdin>', 'eval'))
  1           0 LOAD_CONST               0 (1)
              3 LOAD_CONST               1 (2)
              6 BUILD_LIST               2
              9 RETURN_VALUE        

Here N+1 steps are required for a list object of length N.

This substitution is a CPython-specific peephole optimisation; see the Python/peephole.c source. For other Python implementations then, you want to stick with immutable objects instead.

That said, the best option when using Python 3.2 and up, is to use a set literal:

if number in {1, 2}:

as the peephole optimiser will replace that with a frozenset() object and membership tests against sets are O(1) constant operations:

>>> dis.dis(compile('number in {1, 2}', '<stdin>', 'eval'))
  1           0 LOAD_NAME                0 (number)
              3 LOAD_CONST               2 (frozenset({1, 2}))
              6 COMPARE_OP               6 (in)
              9 RETURN_VALUE

This optimization was added in Python 3.2 but wasn't backported to Python 2.

As such, the Python 2 optimiser doesn't recognize this option and the cost of building either a set or frozenset from the contents is almost guaranteed to be more costly than using a tuple for the test.

Set membership tests are O(1) and fast; testing against a tuple is O(n) worst case. Although testing against a set has to calculate the hash (higher constant cost, cached for immutable types), the cost for testing against a tuple other than the first element is always going to be higher. So on average, sets are easily faster:

>>> import timeit
>>> timeit.timeit('1 in (1, 3, 5)', number=10**7)  # best-case for tuples
0.21154764899984002
>>> timeit.timeit('8 in (1, 3, 5)', number=10**7)  # worst-case for tuples
0.5670104179880582
>>> timeit.timeit('1 in {1, 3, 5}', number=10**7)  # average-case for sets
0.2663505630043801
>>> timeit.timeit('8 in {1, 3, 5}', number=10**7)  # worst-case for sets
0.25939063701662235

这篇关于在“if"子句中使用“in"时的元组或列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆