性能比较:插入与构建 Python 集合操作 [英] Performance comparison: insert vs build Python set operations

查看:54
本文介绍了性能比较:插入与构建 Python 集合操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在python中,是不是更快a) 从 n 个项目的列表中构建一个集合b) 将 n 个项目插入到一个集合中?

In python, is it faster to a) Build a set from a list of n items b) Insert n items into a set?

我找到了这个页面 (http://wiki.python.org/moin/TimeComplexity),但它没有足够的信息来得出结论哪个更快.

I found this page (http://wiki.python.org/moin/TimeComplexity) but it did not have enough information to conclude which was faster.

看起来,在最坏的情况下一次插入一个项目可能需要 O(n*n) 时间(假设它使用 dicts),而在平均情况下需要 O(n*1) 时间.使用列表初始化集合是否提供任何性能改进?

It seems, inserting items one at a time could in the worst case take O(n*n) time (given it uses dicts), and O(n*1) in the average case. Does initializing a set with a list offer any performance improvement?

推荐答案

O() 复杂度方面 - 这绝对是一样的,因为两种方法做的完全一样 - 插入 n 项目成一个集合.

In terms of O() complexity - it's definitely the same, because both approaches do exactly the same - insert n items into a set.

不同之处在于实现:从可迭代对象初始化的一个明显优势是您可以节省大量 Python 级别的函数调用 - 可迭代对象的初始化完全在 C 级别 (**) 上完成.

The difference comes from implementation: One clear advantage of initialization from an iterable is that you save a lot of Python-level function calls - the initialization from a iterable is done wholly on the C level (**).

确实,对 5,000,000 个随机整数的列表进行的一些测试表明,逐个相加更慢:

Indeed, some tests on a list of 5,000,000 random integers shows that adding one by one is slower:

lst = [random.random() for i in xrange(5000000)]
set1 = set(lst)    # takes 2.4 seconds

set2 = set()       # takes 3.37 seconds
for item in lst:
    set2.add(item)

<小时>

(**) 查看集合的代码 (Objects/setobject.c),最终项目插入归结为对 set_add_key 的调用.当从一个可迭代对象初始化时,这个函数在一个紧密的 C 循环中被调用:


(**) Looking inside the code of sets (Objects/setobject.c), eventually item insertion boils down to a call to set_add_key. When initializing from an iterable, this function is called in a tight C loop:

while ((key = PyIter_Next(it)) != NULL) {
  if (set_add_key(so, key) == -1) {
    Py_DECREF(it);
    Py_DECREF(key);
    return -1;
  } 
  Py_DECREF(key);
}

另一方面,对 set.add 的每次调用都会调用属性查找,它解析为 C set_add 函数,后者又调用 set_add_key.由于项目添加本身相对较快(Python 的哈希表实现非常高效),因此这些额外的调用都会累积起来.

On the other hand, each call to set.add invokes attribute lookup, which resolves to the C set_add function, which in turn calls set_add_key. Since the item addition itself is relatively quick (the hash table implementation of Python is very efficient), these extra calls all build up.

这篇关于性能比较:插入与构建 Python 集合操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆