我怎样才能找到号码的列表中最常见的组合? [英] How can I find most frequent combinations of numbers in a list?
问题描述
想象一下,你有数字(或字母)的列表,如
1177783777297461145777267337774652113777236237118777
我想找到号码的最常见的组合,在这个列表:
1位长的组合 - 这是在这个列表中的最频繁的号码
2位长的组合 - 可能是11
3位数字长的组合 - 可能是'777'等
是否有这样的任务,一些特殊algorythm?
更新 好了,我自己我有codeD以下(JAVA)。貌似执行时间是成比例的数据量乘以图案尺寸:
公共静态无效的主要(字串[] args)
{
INT DATA_SIZE = 10000;
INT []数据=新的INT [DATA_SIZE]
的for(int i = 0; I< DATA_SIZE;我++)
{
数据[I] =(int)的(10 *的Math.random())%10;
System.out.print(数据[I]);
}
INT []样式1 =新INT [] {1,2,3};
INT [] PATTERN2 =新INT [] {7,7,7};
INT [] pattern3 =新INT [] {7,7};
的System.out.println();
的System.out.println(匹配(数据,样式1));
的System.out.println(匹配(数据,式样2));
的System.out.println(匹配(数据,pattern3));
}
静态INT比赛(INT []的数据,INT []模式)
{
INT匹配= 0;
INT I = 0;
而(I< data.length)
{
比赛的isEqual =(数据,我,模式)?匹配+ 1:匹配;
我++;
}
返回匹配;
}
静态的isEqual布尔(INT []一,诠释了startIndex,INT [] A2)
{
若(a == A2)
{
返回true;
}
若(a == NULL || A2 == NULL)
{
返回false;
}
的for(int i = 0; I< a2.length;我++)
{
如果(一个[在startIndex + 1]!= A2 [I])
{
返回false;
}
}
返回true;
}
要找到最多的长度至少为k的序列重复的长度为n的字符串,你可以建立一个后缀树(的 http://en.wikipedia.org/wiki/Suffix_tree )的线性时间,然后找到具有节点大多数儿童描述长度为k(或更多)的序列,从根
总的来说,这是在输入的字符串的长度线性时间
有关小k,你最好用天真的算法:
从收藏导入计数器
高清输入(S,K):
C =计数器()
对于i中的xrange(LEN(S) - k)的:
C [S [I:I + K] + = 1
返回c.most_common(1)[0] [0]
对于k在的xrange(1,4):
打印输入('1177783777297461145777267337774652113777236237118777',K)
Imagine you have a list of numbers (or letters), such as
1177783777297461145777267337774652113777236237118777
I want to find the most frequent combinations of numbers in this list:
for 1-digit-long combinations - it is the most frequent number in this list
for 2-digit-long combinations - probably '11'
for 3-digits-long combinations - probably '777' etc
Is there some special algorythm for such a tasks?
UPDATE Well, by myself I have coded the following (Java). Looks like execution time is proportional to data size multiplied by pattern size:
public static void main(String[] args)
{
int DATA_SIZE = 10000;
int[] data = new int[DATA_SIZE];
for (int i = 0; i < DATA_SIZE; i++)
{
data[i] = (int) (10 * Math.random()) % 10;
System.out.print(data[i]);
}
int[] pattern1 = new int[]{1, 2, 3};
int[] pattern2 = new int[]{7, 7, 7};
int[] pattern3 = new int[]{7, 7};
System.out.println();
System.out.println(match(data, pattern1));
System.out.println(match(data, pattern2));
System.out.println(match(data, pattern3));
}
static int match(int[] data, int[] pattern)
{
int matches = 0;
int i = 0;
while (i < data.length)
{
matches = isEqual(data, i, pattern) ? matches + 1 : matches;
i++;
}
return matches;
}
static boolean isEqual(int[] a, int startIndex, int[] a2)
{
if (a == a2)
{
return true;
}
if (a == null || a2 == null)
{
return false;
}
for (int i = 0; i < a2.length; i++)
{
if (a[startIndex + i] != a2[i])
{
return false;
}
}
return true;
}
To find the largest number of repeats of a sequence of length at least k in a string of length n, you can build a suffix tree (http://en.wikipedia.org/wiki/Suffix_tree) in linear time, then find the node that has the most children that describes sequences of length k (or more) from the root.
Overall, this is linear time in the length of the input string.
For small k, you're better off with a naive algorithm:
from collections import Counter
def input(s, k):
c = Counter()
for i in xrange(len(s) - k):
c[s[i : i + k]] += 1
return c.most_common(1)[0][0]
for k in xrange(1, 4):
print input('1177783777297461145777267337774652113777236237118777', k)
这篇关于我怎样才能找到号码的列表中最常见的组合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!