如何确定是否一个列表是另一个列表的子集? [英] How to determine if a list is subset of another list?

查看:649
本文介绍了如何确定是否一个列表是另一个列表的子集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是有效的方法,以确定是否一个列表是另一个列表的一个子集<?/ P>

例如:

  is_subset(名单(1,2,3,4),列表(2,3))//返回true
is_subset(名单(1,2,3,4),列表(3,4,5))//返回false
 

我主要是在寻找有效的算法,而不是太关心如何列表存储。它可以存储在数组中,链接列表或其他数据结构。

感谢

编辑:列表排序

解决方案

下面是一些折衷,你可以做到的。让我们假设你有两套元素,S和T,从宇宙U.我们希望,以确定是否S≥T绘制的。在给定的例子之一,我们已经

S = {1,2,3,4}
T = {3,4,5}
U = {1,2,3,4,5}

1。排序列表(或平衡搜索树)
该方法建议的大部分海报。如果你已经排序的列表,或不在乎花费的时间来创建它们(比如,你不这样做,往往)的长度,那么该算法基本上是线性的时间和空间。这通常是最好的选择。

(公平地说,以其他选择在这里,时间和空间的界限实际上应该包括日志&NBSP; | U |的因素在适当的地方,但是这通常不是relivant)

数据结构:排序列表中为每个S和T或者平衡查找树(如AVL树,红黑树,B +树),可以在不断的空间被遍历的。

算法的:对于T的每个元素中,为了,线性搜索S代表该元素。还记得你离开每个搜索,并开始下一个搜索那里。如果每一个搜索成功,则S≥T。

时间复杂度:关于 O( | S |登录| S | + | T |登录| T | 以创建排序列表, O( MAX(| S |,| T |)比较

空间复杂度:关于 O( | S | + | T |

例(C ++)

 的#include&LT;集&gt;
#包括&LT;算法&GT;

的std ::设为&LT; INT&GT; create_S()
{
    的std ::设为&LT; INT&GT; S;
    //注意:性病::集将为了把这些内部
    S.insert(3);
    S.insert(2);
    S.insert(4);
    S.insert(1);
    返回S;
}

的std ::设为&LT; INT&GT; create_T()
{
    的std ::设为&LT; INT&GT;笔;
    //注意的std ::集将为了把这些内部
    T.insert(4);
    T.insert(3);
    T.insert(5);
    返回笔;
}

诠释的main()
{
    的std ::设为&LT; INT&GT; S = create_S();
    的std ::设为&LT; INT&GT; T = create_T();
    返回的std ::包括(S.begin(),S.end(),T.begin(),T.end());
}
 

2。哈希表
比排序列表更好的平均时间复杂度可以用哈希表来获得。大集的改进行为是以牺牲对小套一般较差的性能为代价的。

与排序名单,我忽略了贡献的宇宙的大小复杂性。

数据结构:为S,任何哈希表中快速迭代对于T

算法:INSERT语句中的每个元素到其哈希表。然后,对于每个元素,检查,看它是否在哈希表中

时间复杂度 O( | S | + | T | 的设置, O( | T | 比较

空间复杂度 O( | S | + | T |

例(C ++)

 的#include&LT; TR1 / unordered_set&GT;

性病:: TR1 :: unordered_set&LT; INT&GT; create_S()
{
    性病:: TR1 :: unordered_set&LT; INT&GT; S;
    S.insert(3);
    S.insert(2);
    S.insert(4);
    S.insert(1);
    返回S;
}

性病:: TR1 :: unordered_set&LT; INT&GT; create_T()
{
    性病:: TR1 :: unordered_set&LT; INT&GT;笔;
    T.insert(4);
    T.insert(3);
    T.insert(5);
    返回笔;
}

布尔包括(常量的std :: TR1 :: unordered_set&LT; INT&GT;&安培; S,
              常量性病:: TR1 :: unordered_set&LT; INT&GT;&安培; T)
{
    对于(性病:: TR1 :: unordered_set&LT; INT&GT; ::为const_iterator ITER = T.begin();
         ITER = T.end()!;
         ++ ITER)
    {
        如果(S.find(* ITER)== S.end())
        {
            返回false;
        }
    }
    返回true;
}

诠释的main()
{
    性病:: TR1 :: unordered_set&LT; INT&GT; S = create_S();
    性病:: TR1 :: unordered_set&LT; INT&GT; T = create_T();
    返回包括(S,T);
}
 

3。位设置
如果宇宙是特别小(假设你只能有元素0-32),那么一个bitset是一个合理的解决方案。运行时间(同样,假设你不关心设置时间)是基本恒定的。在这种情况下你关心的设置,它仍然不是创建一个排序列表更快。

不幸的是,位集变得笨拙非常快,即使是中等规模的宇宙。

数据结构:位向量(通常是一台机器的整数),每个S和T.我们不妨EN code S = 11110和T = 00111,上面例子中的

算法的:计算的交点,通过与在T中的相应位如果结果等于T,计算每个位的位元与S中然后S≥T

时间复杂度 O( | U | + | S | + | T | 的设置, 0( | U | 比较

空间复杂度 O( | U |

例:(C ++)

 的#include&LT;位集合&GT;

//位集合的宇宙总是从0开始,所以创建一个大小为6位集示范。
// U = {0,1,2,3,4,5}

的std :: bitset的&LT; 6' create_S()
{
    的std :: bitset的&LT; 6' S;
    //注:位集不在乎订单
    S.set(3);
    S.set(2);
    S.set(4);
    S.set(1);
    返回S;
}

的std :: bitset的&LT; 6' create_T()
{
    的std :: bitset的&LT; 6'笔;
    //注:位集不在乎订单
    T.set(4);
    T.set(3);
    T.set(5);
    返回笔;
}

诠释的main()
{
    的std :: bitset的&LT; 6' S = create_S();
    的std :: bitset的&LT; 6' T = create_T();

    返回S&放大器;牛逼==笔;
}
 

4。 布鲁姆过滤器
位集的所有速度的好处,而对宇宙大小的讨厌的限制的位集都有。只有一个不好的一面:他们有时(通常,如果你不小心)给出了错误的答案:如果该算法说不,那么你肯定没有列入。如果算法说是,你可能会或可能不会。更好的精度是通过选择一个大的过滤器尺寸和良好的散列函数实现。

由于它们能够而且将会给出错误的答案,布鲁姆过滤器听起来像一个可怕的想法。然而,它们具有一定的用途。一般人会用布鲁姆过滤器来快速做很多包容检查,然后用较慢的确定方法,在需要的时候,以保证正确性。链接的维基百科的文章中提到用布鲁姆过滤器的一些应用程序。

数据结构:一个布隆过滤器是一个奇特的bitset。必须选择一个滤波器的尺寸和散列函数事先。

算法(素描):初始化bitset的为0。要一个元素添加到Bloom过滤器,哈希它与每个哈希函数,并设置bitset中的相应位。确定纳入的工作方式为位集。

时间复杂度 O( 滤镜尺寸

空间复杂度 O( 滤镜尺寸

正确性概率:如果答案为S不包括T总是正确的。喜欢的东西0.6185 ^(| S | X | T | /(过滤&NBSP;大小)))如果回答S包括T。具体地,过滤器的大小必须选择成比例的乘积| S |和| T |给予准确合理的可能性。

What is efficient way to determine if a list is a subset of another list?

Example:

is_subset(List(1,2,3,4),List(2,3))    //Returns true
is_subset(List(1,2,3,4),List(3,4,5))  //Returns false

I am mostly looking for efficient algorithm and not too concern how the list is stored. It can be stored in array, link list or other data structure.

Thanks

EDIT: The list is sorted

解决方案

Here are a few trade offs you can make. Let's assume that you have two sets of elements, S and T, drawn from a universe U. We want to determine if S≥T. In one of the given examples, we have

S={1,2,3,4}
T={3,4,5}
U={1,2,3,4,5}

1. Sorted Lists (or balanced search tree)
The method suggested by most posters. If you already have sorted lists, or don't care about the length of time it takes to create them (say, you're not doing that often), then this algorithm is basically linear time and space. This is usually the best option.

(To be fair to other choices here, the time and space bounds should actually contain factors of "Log |U|" in appropriate places, but this is usually not relivant)

Data structures: Sorted list for each of S and T. Or a balanced search tree (e.g. AVL tree, red-black tree, B+-tree) that can be iterated over in constant space.

Algorithm: For each element in T, in order, search S linearly for that element. Remember where you left off each search, and start the next search there. If every search succeeds, then S≥T.

Time complexity: about O( |S| Log|S| + |T| Log|T| ) to create the sorted lists, O( max(|S|, |T|) ) to compare.

Space complexity: about O( |S| + |T| )

Example (C++)

#include <set>
#include <algorithm>

std::set<int> create_S()
{
    std::set<int> S;
    // note: std::set will put these in order internally
    S.insert(3);
    S.insert(2);
    S.insert(4);
    S.insert(1);
    return S;
}

std::set<int> create_T()
{
    std::set<int> T;
    // note std::set will put these in order internally
    T.insert(4);
    T.insert(3);
    T.insert(5);
    return T;
}

int main()
{
    std::set<int> S=create_S();
    std::set<int> T=create_T();
    return std::includes(S.begin(),S.end(), T.begin(), T.end());
}

2. Hash tables
Better average time complexity than with a sorted list can be obtained using hash tables. The improved behavior for large sets comes at the cost of generally poorer performance for small sets.

As with sorted lists, I'm ignoring the complexity contributed by the size of the universe.

Data structure: Hash table for S, anything quickly iterable for T.

Algorithm: Insert each element of S into its hashtable. Then, for each element in T, check to see if it's in the hash table.

Time complexity: O( |S| + |T| ) to set up, O( |T| ) to compare.

Space complexity: O( |S| + |T| )

Example (C++)

#include <tr1/unordered_set>

std::tr1::unordered_set<int> create_S()
{
    std::tr1::unordered_set<int> S;
    S.insert(3);
    S.insert(2);
    S.insert(4);
    S.insert(1);
    return S;
}

std::tr1::unordered_set<int> create_T()
{
    std::tr1::unordered_set<int> T;
    T.insert(4);
    T.insert(3);
    T.insert(5);
    return T;
}

bool includes(const std::tr1::unordered_set<int>& S, 
              const std::tr1::unordered_set<int>& T)
{
    for (std::tr1::unordered_set<int>::const_iterator iter=T.begin();
         iter!=T.end();
         ++iter)
    {
        if (S.find(*iter)==S.end())
        {
            return false;
        }
    }
    return true;
}

int main()
{
    std::tr1::unordered_set<int> S=create_S();
    std::tr1::unordered_set<int> T=create_T();
    return includes(S,T);
}

3. Bit sets
If your universe is particularly small (let's say you can only have elements 0-32), then a bitset is a reasonable solution. The running time (again, assuming you don't care about setup time) is essentially constant. In the case you do care about setup, it's still faster than creating a sorted list.

Unfortunately, bitsets become unwieldy very quickly for even a moderately sized universe.

Data structure: bit vector (usually a machine integer) for each of S and T. We might encode S=11110 and T=00111, in the given example.

Algorithm: Calculate the intersection, by computing the bitwise 'and' of each bit in S with the corresponding bit in T. If the result equals T, then S≥T.

Time complexity: O( |U| + |S| + |T| ) to setup, O( |U| ) to compare.

Space complexity: O( |U| )

Example: (C++)

#include <bitset>

// bitset universe always starts at 0, so create size 6 bitsets for demonstration.
// U={0,1,2,3,4,5}

std::bitset<6> create_S()
{
    std::bitset<6> S;
    // Note: bitsets don't care about order
    S.set(3);
    S.set(2);
    S.set(4);
    S.set(1);
    return S;
}

std::bitset<6> create_T()
{
    std::bitset<6> T;
    // Note: bitsets don't care about order
    T.set(4);
    T.set(3);
    T.set(5);
    return T;
}

int main()
{
    std::bitset<6> S=create_S();
    std::bitset<6> T=create_T();

    return S & T == T;
}

4. Bloom filters
All the speed benefits of bitsets, without the pesky limitation on universe size the bitsets have. Only one down side: they sometimes (often, if you're not careful) give the wrong answer: If the algorithm says "no", then you definitely don't have inclusion. If the algorithm says "yes", you might or might not. Better accuracy is attained by choosing a large filter size, and good hash functions.

Given that they can and will give wrong answers, Bloom filters might sound like a horrible idea. However, they have definite uses. Generally one would use Bloom filters to do many inclusion checks quickly, and then use a slower deterministic method to guarantee correctness when needed. The linked Wikipedia article mentions some applications using Bloom filters.

Data structure: A Bloom filter is a fancy bitset. Must choose a filter size, and hash functions beforehand.

Algorithm (sketch): Initialize the bitset to 0. To add an element to a bloom filter, hash it with each hash function, and set the corresponding bit in the bitset. Determining inclusion works just as for bitsets.

Time complexity: O( filter size )

Space complexity: O( filter size )

Probability of correctness: Always correct if it answers for "S does not include T". Something like 0.6185^(|S|x|T|/(filter size))) if it answers "S includes T". In particular, the filter size must be chosen proportional to the product of |S| and |T| to give reasonable probability of accuracy.

这篇关于如何确定是否一个列表是另一个列表的子集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆