确定一组是否为另一组子集的有效代码 [英] An efficient code to determine if a set is a subset of another set

查看:103
本文介绍了确定一组是否为另一组子集的有效代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种有效的方法来确定一个集合是否是Matlab或Mathematica中另一个集合的子集.

I am looking for an efficient way to determine if a set is a subset of another set in Matlab or Mathematica.

示例: 设置A = [1 2 3 4] 设定B = [4 3] 设定C = [3 4 1] 设置D = [4 3 2 1]

Example: Set A = [1 2 3 4] Set B = [4 3] Set C = [3 4 1] Set D = [4 3 2 1]

输出应为:Set A

The output should be: Set A

集合B和C属于集合A,因为A包含它们的所有元素,因此可以删除它们(集合中元素的顺序无关紧要).集合D与集合A具有相同的元素,并且由于集合A在集合D之前,因此我想保留集合A并删除集合D.

Sets B and C belong to set A because A contains all of their elements, therefore, they can be deleted (the order of elements in a set doesn't matter). Set D has the same elements as set A and since set A precedes set D, I would like to simply keep set A and delete set D.

因此,有两个基本规则: 1.如果一个集合是另一个集合的子集,则将其删除 2.如果集合的元素与先前集合的元素相同,则将其删除

So there are two essential rules: 1. Delete a set if it is a subset of another set 2. Delete a set if its elements are the same as those of a preceding set

我的Matlab代码执行效率不是很高-它主要由嵌套循环组成.

My Matlab code is not very efficient at doing this - it mostly consists of nested loops.

非常欢迎提出建议!

其他说明:问题在于,存在大量的集合时,将会有大量的成对比较.

Additional explanation: the issue is that with a large number of sets there will be a very large number of pairwise comparisons.

推荐答案

您可能想看一看内置的

You will likely want to take a look at the built-in set operation functions in MATLAB. Why reinvent the wheel if you don't have to? ;)

提示: ISMEMBER 功能可能会让您特别感兴趣.

HINT: The ISMEMBER function may be of particular interest to you.

这是使用嵌套循环解决此问题的一种方法,但可以对其进行设置以尝试减少潜在的迭代次数.首先,我们可以使用 Marc 的注释中的建议,以按元素数量对集合列表进行排序,以便它们从最大到最小排列:

Here's one way you can approach this problem using nested loops, but setting them up to try and reduce the number of potential iterations. First, we can use the suggestion in Marc's comment to sort the list of sets by their number of elements so that they are arranged largest to smallest:

setList = {[1 2 3 4],...  %# All your sets, stored in one cell array
           [4 3],...
           [3 4 1],...
           [4 3 2 1]};
nSets = numel(setList);                       %# Get the number of sets
setSizes = cellfun(@numel,setList);           %# Get the size of each set
[temp,sortIndex] = sort(setSizes,'descend');  %# Get the sort index
setList = setList(sortIndex);                 %# Sort the sets

现在,我们可以将循环设置为从列表末尾的最小集开始,然后将它们与列表开头的最大集进行比较,以增加我们很快会找到超集的几率(即,依靠较大的集合更有可能包含较小的集合).找到超集后,我们从列表中删除该子集并中断内循环:

Now we can set up our loops to start with the smallest sets at the end of the list and compare them first to the largest sets at the start of the list to increase the odds we will find a superset quickly (i.e. we're banking on larger sets being more likely to contain smaller sets). When a superset is found, we remove the subset from the list and break the inner loop:

for outerLoop = nSets:-1:2
  for innerLoop = 1:(outerLoop-1)
    if all(ismember(setList{outerLoop},setList{innerLoop}))
      setList(outerLoop) = [];
      break;
    end
  end
end

运行上述代码后,setList将删除其中的所有集合,这些集合是列表中位于它们之前的其他集合的子集或重复项.

After running the above code, setList will have all sets removed from it that are either subsets or duplicates of other sets preceding them in the list.

在最佳情况下(例如,问题中的示例数据),每次第一次迭代后,内部循环都会中断,仅使用

In the best case scenario (e.g. the sample data in your question) the inner loop breaks after the first iteration every time, performing only nSets-1 set comparisons using ISMEMBER. In the worst case scenario the inner loop never breaks and it will perform (nSets-1)*nSets/2 set comparisons.

这篇关于确定一组是否为另一组子集的有效代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆