高效的算法找出两套最大公共子集? [英] Efficient algorithm to find a maximum common subset of two sets?

查看:323
本文介绍了高效的算法找出两套最大公共子集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

每个集合包含一堆校验和。例如:
集A:
{
 4445968d0e100ad08323df8c895cea15
 a67f8052594d6ba3f75502c0b91b868f
 07736dde2f8484a4a3af463e05f039e3
 5b1e374ff2ba949ab49870ca24d3163a
}

Each set contains bunch of checksums. For example:
Set A:
{
4445968d0e100ad08323df8c895cea15
a67f8052594d6ba3f75502c0b91b868f
07736dde2f8484a4a3af463e05f039e3
5b1e374ff2ba949ab49870ca24d3163a
}

B组:
{
 6639e1da308fd7b04b7635a17450df7c
 4445968d0e100ad08323df8c895cea15
 a67f8052594d6ba3f75502c0b91b868f
}

Set B:
{
6639e1da308fd7b04b7635a17450df7c
4445968d0e100ad08323df8c895cea15
a67f8052594d6ba3f75502c0b91b868f
}

A和B的最大公共子集:
{
 4445968d0e100ad08323df8c895cea15
 a67f8052594d6ba3f75502c0b91b868f
}

The maximum common subset of A and B is:
{
4445968d0e100ad08323df8c895cea15
a67f8052594d6ba3f75502c0b91b868f
}

很多这些操作都将被执行,所以我在寻找一个有效的算法来做到这一点。 感谢您的帮助。

A lot of these operations will be performed, so I'm looking for an efficient algorithm to do so. Thanks for your help.

推荐答案

将组之一在哈希表和遍历其他,丢弃不在散列元素。另外,无论排序并通过他们迭代同时,作为归并排序。

Put one of the sets in a hash table and iterate through the other, discarding elements that aren't in the hash. Alternatively, sort both and iterate through them simultaneously, as in merge sort.

编辑:后一种方法创建一个排序结果。我要补充一点,如果集广泛不同大小的,他们是presorted(说是因为你做了一堆交点),那么你就可以实现大的性能提升用无界二分查找跳过极目大名单。

The latter method creates a sorted result. I should add that if the sets are of widely disparate sizes and they're presorted (say because you're doing a bunch of intersections), then you can realize a large performance improvement by using "unbounded" binary search to skip ahead in the large list.

这篇关于高效的算法找出两套最大公共子集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆