为什么将列表转换为集合要比仅使用列表来计算列表差异更快? [英] Why is converting a list to a set faster than using just list to compute a list difference?

查看:92
本文介绍了为什么将列表转换为集合要比仅使用列表来计算列表差异更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说,我希望计算两个列表C = A - B的差:

Say, I wish to compute the difference of two lists C = A - B:

A = [1,2,3,4,5,6,7,8,9] 
B = [1,3,5,8,9]
C = [2,4,6,7]          #Result

AB都用唯一的整数排序(不确定是否有办法告诉Python有关列表的此属性).我需要保留元素的顺序. AFAIK有两种可能的实现方式

A and B are both sorted with unique integers (not sure if there is a way to tell Python about this property of the list). I need to preserve the order of the elements. AFAIK there are two possible ways of doing it

方法1 :将B转换为集合并使用列表推导生成C:

s = set(B)
C = [x for x in A if x not in s]

方法2 :直接使用列表理解:

C = [x for x in A if x not in B]

为什么#1#2更有效?转换为集合没有开销吗?我在这里想念什么?

Why is #1 more efficient than #2? Isn't there an overhead to convert to a set? What am I missing here?

此答案中给出了一些性能基准.

更新:我知道集合的平均O(1)查找时间优于列表的O(n),但是如果原始列表A包含大约一百万个整数,则不会集合创建实际上不需要花费更长的时间吗?

UPDATE: I'm aware that a set's average O(1) lookup time beats that of a list's O(n) but if the original list A contains about a million or so integers, wouldn't the set creation actually take longer?

推荐答案

将列表转换为集合会有开销,但是对于那些in测试而言,集合比列表快

There is overhead to convert a list to a set, but a set is substantially faster than a list for those in tests.

您可以立即查看项目x是否在集合y中,因为在下面使用了哈希表.无论您的集合有多大,查找时间都是相同的(基本上是瞬时的)-这在Big-O表示法中称为O(1).对于列表,您必须单独检查每个元素以查看项目x是否在列表z中.随着列表的增加,检查将花费更长的时间-这是O(n),这意味着操作的时间与列表的长度直接相关.

You can instantly see if item x is in set y because there's a hash table being used underneath. No matter how large your set is, the lookup time is the same (basically instantaneous) - this is known in Big-O notation as O(1). For a list, you have to individually check every element to see if item x is in list z. As your list grows, the check will take longer - this is O(n), meaning the length of the operation is directly tied to how long the list is.

提高的速度可以抵消设置创建的开销,这就是设置检查最终变得更快的方式.

That increased speed can offset the set creation overhead, which is how your set check ends up being faster.

要回答另一个问题,Python无法确定列表是否已排序-如果您使用的是标准list对象,则无法.因此,它不能通过列表理解来实现O(log n)性能.如果您想编写自己的二进制搜索方法(假定列表已排序),则可以这样做,但是O(1)在任何一天都胜过O(log n).

to answer that other question, Python has no way of determining that your list is sorted - not if you're using a standard list object, anyway. So it can't achieve O(log n) performance with a list comprehension. If you wanted to write your own binary search method which assumes the list is sorted, you can certainly do so, but O(1) beats O(log n) any day.

这篇关于为什么将列表转换为集合要比仅使用列表来计算列表差异更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆