最常见的是存在的组合 [英] Most commonly occuring combination

查看:118
本文介绍了最常见的是存在的组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

予有整数阵列,其中每个阵列有一些数字排序的列表。在这里,我想找到基于所有的整数数组序列的最常见的组合。例如,如果阵列的列表如下

I have a list of integer array, where each array have some numbers sorted. Here I want to find the most commonly occurring combination of sequence of integers based on all the array. For example if the list of array is as follows

A1 - 1 2 3 5 7 8
A2 - 2 3 5 6 7
A3 - 3 5 7 9
A4 - 1 2 3 7 9
A5 - 3 5 7 10

下面

{3,5,7} - {A1,A3,A5}
{2,3}   - {A1,A2,A4}

因此​​,我们可以把{3,5,7}或{2,3}作为最常出现的组合。

So we can take {3,5,7} or {2,3} as the most commonly occurring combinations.

现在我使用的算法如下

查找一组与所有其他人的交集。并存储结果集。增加一个结果集出现的情况下,如果它已经存在。 对于例如: 找到所有的以下

Find intersection of a set with all others. And store the resulting set. Increment a resulting set occurrence in case if its already exist. for eg : Find intersections of all the below

A1 intersection A2 
A1 intersection A3
A1 intersection A4
A1 intersection A5
A2 intersection A3  
A2 intersection A4 
A2 intersection A5  
A3 intersection A4
A3 intersection A5  
A4 intersection A5  

下面A1相交A3为相同的A3相交A5,因此设定{3,5,7}发生可以被设置为2。 同样每个结果集的发生才能确定。

Here A1 intersection A3 is same as A3 intersection A5 , hence set-{3,5,7} occurrence can be set as 2. Similarly each resulting set occurrence can be determined.

但这个算法需要为O(n ^ 2)的复杂性。 假设每套排序,上午pretty的肯定,我们可以找到一个更好的算法为O(n)的复杂性,我是无法下笔。 任何人都可以提出一个O(n)的算法相同。

But this algorithm demands O(n^2) complexity. Assuming each set is sorted , am pretty sure that we can find a better algorithm with O(n) complexity which i am not able to pen down. Can anyone suggest a O(n) algorithm for the same.

推荐答案

如果您有长度为n的序列,那么它的preFIX是长度为n-1和发生频率至少不 - 一个退化的情况是最常见的特征,它是至少为常发生的任何较长序列长度为1的序列。你有一个最低的后缀长度,你有兴趣吗?

If you have a sequence of length n, then its prefix is of length n-1 and occurs at least as often - a degenerate case is the most common character, which is a sequence of length 1 that occurs at least as often as any longer sequence. Do you have a minimum suffix length you are interested in?

不管此,一个想法是连接全部序列,其中无处出现不同的整数将它们分开,然后计算的 http://en.wikipedia.org/wiki/Suffix_array 线性时间。一次通过后缀数组应该让你找到任何给定长度的最常见的序列 - 它不应该跨越两个不同的阵列之间的差距,因为每一个这样的长度为n的序列是唯一的,因为分开阵列的字符独特。 (另见 http://en.wikipedia.org/wiki/LCP_array

Regardless of this, one idea is to concatenate all of the sequences, separating them by different integers which appear nowhere else, and then compute the http://en.wikipedia.org/wiki/Suffix_array in linear time. One pass through the suffix array should allow you to find the most common subsequence of any given length - and it shouldn't cross the gap between two different arrays, because each such sequence of length n is unique, because the characters separating the arrays are unique. (see also the http://en.wikipedia.org/wiki/LCP_array)

这篇关于最常见的是存在的组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆