用于创建集合的 Python 性能比较 - set() 与 {} 字面量 [英] Python performance comparison for creating sets - set() vs. {} literal

查看:52
本文介绍了用于创建集合的 Python 性能比较 - set() 与 {} 字面量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题之后的讨论让我感到疑惑,所以我决定运行一些测试并比较 set((x,y,z)){x,y,z} 的创建时间以在Python(我使用的是 Python 3.7).

A discussion following this question left me wondering, so I decided to run a few tests and compare the creation time of set((x,y,z)) vs. {x,y,z} for creating sets in Python (I'm using Python 3.7).

我使用 timetimeit 比较了这两种方法.两者都与以下结果一致*:

I compared the two methods using time and timeit. Both were consistent* with the following results:

test1 = """
my_set1 = set((1, 2, 3))
"""
print(timeit(test1))

结果:0.30240735499999993

test2 = """
my_set2 = {1,2,3}
"""
print(timeit(test2))

结果:0.10771795900000003

所以第二种方法比第一种方法快了近 3 倍.这对我来说是一个非常令人惊讶的差异.为了以这种方式优化 set() 方法上的集合文字的性能,到底发生了什么?在哪些情况下建议使用哪种方法?

So the second method was almost 3 times faster than the first. This was quite a surprising difference to me. What is happening under the hood to optimize the performance of the set literal over the set() method in such a way? Which would be advisable for which cases?

* 注意:我只展示了 timeit 测试的结果,因为它们是在许多样本上平均的,因此可能更可靠,但是当使用 time 进行的测试在两种情况下显示出相似的差异.

* Note: I only show the results of the timeit tests since they are averaged over many samples, and thus perhaps more reliable, but the results when testing with time showed similar differences in both cases.

我知道这个类似的问题a> 虽然它回答了我最初问题的某些方面,但并没有涵盖所有方面.问题中没有解决集合,并且由于 空集合 在 python 中没有文字语法,我很好奇使用文字创建集合与使用 有何不同(如果有的话)set() 方法.另外,我想知道 set((x,y,z)tuple 参数 的处理是如何在幕后发生的,以及它对运行时可能产生的影响.Coldspeed 的精彩回答帮助解决了问题.

I'm aware of this similar question and though it answers certain aspects of my original question, it didn't cover all of it. Sets were not addressed in the question, and as empty sets do not have a literal syntax in python, I was curious how (if at all) set creation using a literal would differ from using the set() method. Also, I wondered how the handling of the tuple parameter in set((x,y,z) happens behind the scenes and what is its possible impact on runtime. The great answer by coldspeed helped clear things up.

推荐答案

(这是对现在已从最初问题中编辑出来的代码的回应)您忘记在第二种情况下调用函数.进行适当的修改,结果如预期:

(This is in response to code that has now been edited out of the initial question) You forgot to call the functions in the second case. Making the appropriate modifications, the results are as expected:

test1 = """
def foo1():
     my_set1 = set((1, 2, 3))
foo1()
"""    
timeit(test1)
# 0.48808742000255734

test2 = """
def foo2():
    my_set2 = {1,2,3}
foo2()
"""    
timeit(test2)
# 0.3064506609807722

<小时>

现在,时间不同的原因是因为 set() 是一个需要查找符号表的函数调用,而 {...}集合构造是语法的人工制品,而且速度要快得多.


Now, the reason for the difference in timings is because set() is a function call requiring a lookup into the symbol table, whereas the {...} set construction is an artefact of the syntax, and is much faster.

观察反汇编后的字节码,区别很明显.

The difference is obvious when observing the disassembled byte code.

import dis

dis.dis("set((1, 2, 3))")
  1           0 LOAD_NAME                0 (set)
              2 LOAD_CONST               3 ((1, 2, 3))
              4 CALL_FUNCTION            1
              6 RETURN_VALUE

dis.dis("{1, 2, 3}")
  1           0 LOAD_CONST               0 (1)
              2 LOAD_CONST               1 (2)
              4 LOAD_CONST               2 (3)
              6 BUILD_SET                3
              8 RETURN_VALUE

在第一种情况下,函数调用由元组 (1, 2, 3) 上的指令 CALL_FUNCTION 进行(它也有自己的开销,虽然次要——它通过 LOAD_CONST 作为常量加载),而在第二条指令中只是一个 BUILD_SET 调用,这样更有效.

In the first case, a function call is made by the instruction CALL_FUNCTION on the tuple (1, 2, 3) (which also comes with its own overhead, although minor—it is loaded as a constant via LOAD_CONST), whereas in the second instruction is just a BUILD_SET call, which is more efficient.

回复:您关于元组构建时间的问题,我们认为这实际上可以忽略不计:

Re: your question regarding the time taken for tuple construction, we see this is actually negligible:

timeit("""(1, 2, 3)""")
# 0.01858693000394851

timeit("""{1, 2, 3}""")
# 0.11971827200613916

元组是不可变的,因此编译器通过将其作为常量加载来优化此操作——这称为 不断折叠(你可以从上面的LOAD_CONST 指令中清楚地看到这一点),所以所花费的时间可以忽略不计.这在集合中看不到,因为它们是可变的(感谢 @user2357112 指出这一点).

Tuples are immutable, so the compiler optimises this operation by loading it as a constant—this is called constant folding (you can see this clearly from the LOAD_CONST instruction above), so the time taken is negligible. This is not seen with sets are they are mutable (Thanks to @user2357112 for pointing this out).

对于更大的序列,我们看到了类似的行为.{..} 语法在使用集合推导式构建集合时更快,而 set() 必须从生成器构建集合.

For larger sequences, we see similar behaviour. {..} syntax is faster at constructing sets using set comprehensions as opposed to set() which has to build the set from a generator.

timeit("""set(i for i in range(10000))""", number=1000)
# 0.9775058150407858

timeit("""{i for i in range(10000)}""", number=1000)
# 0.5508635920123197

作为参考,您还可以在较新的版本上使用可迭代解包:

For reference, you can also use iterable unpacking on more recent versions:

timeit("""{*range(10000)}""", number=1000)
# 0.7462548640323803

然而,有趣的是,当直接在 range 上调用时 set() 更快:

Interestingly, however, set() is faster when called directly on range:

timeit("""set(range(10000))""", number=1000)
# 0.3746800610097125

这恰好比集合构造更快.您将看到其他序列(例如 lists)的类似行为.

This happens to be faster than the set construction. You will see similar behaviour for other sequences (such as lists).

我的建议是在构造集合文字时使用 {...} 集合理解,并作为将生成器理解传递给 set() 的替代方法;而是使用 set() 将现有序列/可迭代对象转换为集合.

My recommendation would be to use the {...} set comprehension when constructing set literals, and as an alternative to passing a generator comprehension to set(); and instead use set() to convert an existing sequence/iterable to a set.

这篇关于用于创建集合的 Python 性能比较 - set() 与 {} 字面量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆