最常见的组合 [英] Most commonly occurring combination

查看:63
本文介绍了最常见的组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个整数数组的列表,其中每个数组都有一些已排序的数字。在这里,我想找到基于所有数组的整数序列最常见的组合。例如,如果数组列表如下

I have a list of integer array, where each array have some numbers sorted. Here I want to find the most commonly occurring combination of sequence of integers based on all the array. For example if the list of array is as follows

A1 - 1 2 3 5 7 8
A2 - 2 3 5 6 7
A3 - 3 5 7 9
A4 - 1 2 3 7 9
A5 - 3 5 7 10

此处

{3,5,7} - {A1,A3,A5}
{2,3}   - {A1,A2,A4}

我们可以将{3,5,7}或{2,3}作为最常见的组合。

So we can take {3,5,7} or {2,3} as the most commonly occurring combinations.

现在我使用的算法如下

查找集合与所有其他集合的交集。并存储结果集。如果结果集出现已经存在,则将其增加。
例如:
查找以下所有的交点

Find intersection of a set with all others. And store the resulting set. Increment a resulting set occurrence in case if its already exist. for eg : Find intersections of all the below

A1 intersection A2 
A1 intersection A3
A1 intersection A4
A1 intersection A5
A2 intersection A3  
A2 intersection A4 
A2 intersection A5  
A3 intersection A4
A3 intersection A5  
A4 intersection A5  

此处A1交叉点A3与A3交叉点A5相同,因此将- {3,5,7}出现次数可以设置为2。
同样,可以确定每个结果集出现的次数。

Here A1 intersection A3 is same as A3 intersection A5 , hence set-{3,5,7} occurrence can be set as 2. Similarly each resulting set occurrence can be determined.

但是该算法要求O(n ^ 2)复杂度。
假设每个集合都已排序,我很确定我们可以找到一个O(n)复杂度更好的算法,而我无法对此加以说明。
谁能建议同样的O(n)算法。

But this algorithm demands O(n^2) complexity. Assuming each set is sorted , am pretty sure that we can find a better algorithm with O(n) complexity which i am not able to pen down. Can anyone suggest a O(n) algorithm for the same.

推荐答案

如果序列长度为n ,则其前缀的长度为n-1,并且至少经常出现-简并的情况是最常见的字符,它是长度为1的序列,其出现频率至少与任何更长的序列相同。

If you have a sequence of length n, then its prefix is of length n-1 and occurs at least as often - a degenerate case is the most common character, which is a sequence of length 1 that occurs at least as often as any longer sequence. Do you have a minimum suffix length you are interested in?

无论如何,一个想法是将所有序列连接起来,用不同的整数分隔它们,而这些整数在其他地方都没有出现,然后在线性时间内计算 http://en.wikipedia.org/wiki/Suffix_array 。一遍遍后缀数组应该使您能够找到任何给定长度的最常见子序列-并且不应跨越两个不同数组之间的间隔,因为每个这样的长度为n的序列都是唯一的,因为分隔数组的字符是独特。 (另请参见 http://en.wikipedia.org/wiki/LCP_array

Regardless of this, one idea is to concatenate all of the sequences, separating them by different integers which appear nowhere else, and then compute the http://en.wikipedia.org/wiki/Suffix_array in linear time. One pass through the suffix array should allow you to find the most common subsequence of any given length - and it shouldn't cross the gap between two different arrays, because each such sequence of length n is unique, because the characters separating the arrays are unique. (see also the http://en.wikipedia.org/wiki/LCP_array)

这篇关于最常见的组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆