len() 在集合和列表方面的复杂性 [英] Complexity of len() with regard to sets and lists

查看:33
本文介绍了len() 在集合和列表方面的复杂性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

len() 在集合和列表方面的复杂度同样是 O(1).为什么处理集合需要更多时间?

The complexity of len() with regards to sets and lists is equally O(1). How come it takes more time to process sets?

~$ python -m timeit "a=[1,2,3,4,5,6,7,8,9,10];len(a)"
10000000 loops, best of 3: 0.168 usec per loop
~$ python -m timeit "a={1,2,3,4,5,6,7,8,9,10};len(a)"
1000000 loops, best of 3: 0.375 usec per loop

它是否与特定的基准有关,例如,构建集合比列表花费更多的时间并且基准也考虑到这一点?

Is it related to the particular benchmark, as in, it takes more time to build sets than lists and the benchmark takes that into account as well?

如果创建一个集合对象比创建一个列表花费更多的时间,那么潜在的原因是什么?

If the creation of a set object takes more time compared to creating a list, what would be the underlying reason?

推荐答案

首先,你没有测过len()的速度,你测的是len()的速度以len()的速度创建一个列表/集合.

Firstly, you have not measured the speed of len(), you have measured the speed of creating a list/set together with the speed of len().

使用timeit--setup参数:

$ python -m timeit --setup "a=[1,2,3,4,5,6,7,8,9,10]" "len(a)"
10000000 loops, best of 3: 0.0369 usec per loop
$ python -m timeit --setup "a={1,2,3,4,5,6,7,8,9,10}" "len(a)"
10000000 loops, best of 3: 0.0372 usec per loop

您传递给 --setup 的语句在测量 len() 的速度之前运行.

The statements you pass to --setup are run before measuring the speed of len().

其次,您应该注意到 len(a) 是一个非常快速的语句.测量其速度的过程可能会受到噪音"的影响.考虑 由 timeit 执行(和测量)的代码等价于以下内容:

Secondly, you should note that len(a) is a pretty quick statement. The process of measuring its speed may be subject to "noise". Consider that the code executed (and measured) by timeit is equivalent to the following:

for i in itertools.repeat(None, number):
    len(a)

因为 len(a)itertools.repeat(...).__next__() 都是快速操作,而且它们的速度可能相似,所以 itertools.repeat(...).__next__() 的速度code>itertools.repeat(...).__next__() 可能会影响时间.

Because both len(a) and itertools.repeat(...).__next__() are fast operations and their speeds may be similar, the speed of itertools.repeat(...).__next__() may influence the timings.

因此,您最好测量 len(a);连(一);...;len(a)(重复 100 次左右),因此 for 循环体比迭代器花费的时间要长得多:

For this reason, you'd better measure len(a); len(a); ...; len(a) (repeated 100 times or so) so that the body of the for loop takes a considerably higher amount of time than the iterator:

$ python -m timeit --setup "a=[1,2,3,4,5,6,7,8,9,10]" "$(for i in {0..1000}; do echo "len(a)"; done)"
10000 loops, best of 3: 29.2 usec per loop
$ python -m timeit --setup "a={1,2,3,4,5,6,7,8,9,10}" "$(for i in {0..1000}; do echo "len(a)"; done)"
10000 loops, best of 3: 29.3 usec per loop

(结果仍然说 len() 在列表和集合上的性能相同,但现在您确定结果是正确的.)

(The results still says that len() has the same performances on lists and sets, but now you are sure that the result is correct.)

第三,复杂性"和速度"确实是相关的,但我相信您有些困惑.len() 对于列表和集合具有 O(1) 复杂性这一事实并不意味着它必须在列表和集合上以相同的速度运行.

Thirdly, it's true that "complexity" and "speed" are related, but I believe you are making some confusion. The fact that len() has O(1) complexity for lists and sets does not imply that it must run with the same speed on lists and sets.

这意味着,平均而言,无论列表 a 有多长,len(a) 都会执行相同的渐近步数.并且无论集合 b 有多长,len(b) 都会执行相同的渐近步数.但是计算列表和集合大小的算法可能不同,导致性能不同(时间表明并非如此,但这可能是一种可能性).

It means that, on average, no matter how long the list a is, len(a) performs the same asymptotic number of steps. And no matter how long the set b is, len(b) performs the same asymptotic number of steps. But the algorithm for computing the size of lists and sets may be different, resulting in different performances (timeit shows that this is not the case, however this may be a possibility).

最后,

如果创建一个集合对象比创建一个列表花费更多的时间,那么潜在的原因是什么?

If the creation of a set object takes more time compared to creating a list, what would be the underlying reason?

如您所知,集合不允许重复元素.CPython 中的集合被实现为哈希表(以确保平均 O(1) 插入和查找):构建和维护哈希表比向列表添加元素复杂得多.

A set, as you know, does not allow repeated elements. Sets in CPython are implemented as hash tables (to ensure average O(1) insertion and lookup): constructing and maintaining a hash table is much more complex than adding elements to a list.

具体来说,在构建集合时,您必须计算哈希值、构建哈希表、查找以避免插入重复事件等.相比之下,CPython 中的列表被实现为一个简单的指针数组,根据需要对其进行malloc()ed 和 realloc()ed.

Specifically, when constructing a set, you have to compute hashes, build the hash table, look it up to avoid inserting duplicated events and so on. By contrast, lists in CPython are implemented as a simple array of pointers that is malloc()ed and realloc()ed as required.

这篇关于len() 在集合和列表方面的复杂性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆