如何检查任何给定集合中是否存在值 [英] How to check if a value is present in any of given sets

查看:66
本文介绍了如何检查任何给定集合中是否存在值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有不同的集合(它们必须不同,我无法根据我正在处理的数据类型加入它们):

Say I have different sets (they have to be different, I cannot join them as per the kind of data I am working with):

r = set([1,2,3])
s = set([4,5,6])
t = set([7,8,9])

检查给定变量是否存在于其中任何一个中的最佳方法是什么?

What is the best way to check if a given variable is present in any of them?

我正在使用:

if myvar in r \
   or myvar in s \
   or myvar in t:

但我想知道是否可以通过使用 set 的属性(例如 union)来减少这种情况.

But I wonder if this can be reduced somehow by using set's properties such as union.

以下有效,但我找不到定义多个联合的方法:

The following works, but I can't find a way to define multiple unions:

if myvar in r.union(s)
   or myvar in t:

我还想知道这个联合是否会以某种方式影响性能,因为我猜会动态创建一个临时的 set.

I am also wondering if this union will somehow affect performance, since I guess a temporary set will be created on the fly.

推荐答案

随便用:

if any(myvar in x for x in  (r,s,t))

集合查找是 0(1) 因此创建联合来检查变量是否在任何集合中是完全没有必要的,而不是简单地使用 in 进行检查任何 一旦找到匹配就会短路并且不会创建新的集合.

set lookups are 0(1) so creating a union to check if the variable is in any set is totally unnecessary instead of simply checking using in with any which will short circuit as soon as a match is found and does not create a new set.

我也想知道这个联合是否会以某种方式影响性能

当然,合并集合会影响性能,它增加了复杂性,您每次都在创建一个新集合,即 O(len(r)+len(s)+len(t)) 这样您就可以告别使用高效查找集合的真正意义了.

Yes of course unioning the sets affects performance, it adds to the complexity, you are creating a new set every time which is O(len(r)+len(s)+len(t)) so you can say goodbye to the real point of using sets which are efficient lookups.

所以最重要的是,你想要保持高效的查找,你必须将集合联合一次并将它们保存在内存中,创建一个新变量,然后使用它来查找 myvar 所以初始创建将是 0(n),之后查找将是 0(1).

So the bottom line is that is you want to keep efficient lookups you will have to union the set once and keep them in memory creating a new variable then using that to do your lookup for myvar so the initial creation will be 0(n) and lookups will be 0(1) thereafter.

如果不是每次都想首先创建联合时进行查找,那么您将获得 r+s+t -> 长度的线性解.set.union(*(r, s, t)) 与最坏的三个常量(平均)查找相反.这也意味着总是从新的联合集中添加或删除任何从 r,st 中添加/删除的元素.

If you don't every time you want to do a lookup first creating the union you will have a linear solution in the length of r+s+t -> set.union(*(r, s, t)) as opposed to at worst three constant(on average) lookups. That also means always adding or removing any elements from the new unioned set that are added/removed from r,s or t.

在中等大小的集合上的一些实际时间显示了差异:

Some realistic timings on moderately large sized sets show exactly the difference:

In [1]: r = set(range(10000))

In [2]: s = set(range(10001,20000))

In [3]: t = set(range(20001,30000))

In [4]: timeit any(29000 in st for st in (r,s,t))
1000000 loops, best of 3: 869 ns per loop  

In [5]: timeit 29000 in r | s | t
1000 loops, best of 3: 956 µs per loop

In [6]: timeit 29000 in reduce(lambda x,y :x.union(y),[r,s,t])
1000 loops, best of 3: 961 µs per loop

In [7]: timeit 29000 in r.union(s).union(t)
1000 loops, best of 3: 953 µs per loop

对联合进行计时表明,几乎所有的时间都花在联合调用上:

Timing the union shows that pretty much all the time is spent in the union calls:

In [8]: timeit r.union(s).union(t)
1000 loops, best of 3: 952 µs per loop

使用更大的集合并获取最后一个集合中的元素:

Using larger sets and getting the element in the last set:

In [15]: r = set(range(1000000))

In [16]: s = set(range(1000001,2000000))

In [17]: t = set(range(2000001,3000000))


In [18]: timeit any(2999999 in st for st in (r,s,t))
1000000 loops, best of 3: 878 ns per loop

In [19]: timeit 2999999 in reduce(lambda x,y :x.union(y),[r,s,t])
1 loops, best of 3: 161 ms per loop

In [20]: timeit 2999999 in r | s | t
10 loops, best of 3: 157 ms per loop

无论使用 any 的集合有多大,实际上都没有区别,但是随着集合大小的增加,使用 union 的运行时间也会增加.

There is literally no difference no matter how large the sets get using any but as the set sizes grow so does the running time using union.

让它更快的唯一方法是坚持使用 ,但我们正在考虑几百纳秒的差异,这是创建生成器表达式和函数调用的成本:

The only way to make it faster would be to stick to or but we are taking the difference of a few hundred nanoseconds which is the cost of creating the generator expression and the function call:

In [22]: timeit 2999999 in r or 2999999 in s or 2999999 in t
10000000 loops, best of 3: 152 ns per loop

联合集合 set.union(*(r, s, t)) 也是最快的,因为你不构建中间集:

To union sets set.union(*(r, s, t)) is also the fastest as you don't build intermediary sets:

In [47]: timeit 2999999 in set.union(*(r,s,t))
10 loops, best of 3: 108 ms per loop
In [49]: r | s | t  == set.union(*(r,s,t))
Out[49]: True

这篇关于如何检查任何给定集合中是否存在值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆