我怎样才能找到号码的列表中最常见的组合? [英] How can I find most frequent combinations of numbers in a list?

查看:103
本文介绍了我怎样才能找到号码的列表中最常见的组合?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想象一下,你有数字(或字母)的列表,如

  

1177783777297461145777267337774652113777236237118777

我想找到号码的最常见的组合,在这个列表:

1位长的组合 - 这是在这个列表中的最频繁的号码

2位长的组合 - 可能是11

3位数字长的组合 - 可能是'777'等

是否有这样的任务,一些特殊algorythm?

更新 好了,我自己我有codeD以下(JAVA)。貌似执行时间是成比例的数据量乘以图案尺寸:

 公共静态无效的主要(字串[] args)
{
    INT DATA_SIZE = 10000;
    INT []数据=新的INT [DATA_SIZE]
    的for(int i = 0; I< D​​ATA_SIZE;我++)
    {
        数据[I] =(int)的(10 *的Math.random())%10;
        System.out.print(数据[I]);
    }

    INT []样式1 =新INT [] {1,2,3};
    INT [] PATTERN2 =新INT [] {7,7,7};
    INT [] pattern3 =新INT [] {7,7};

    的System.out.println();
    的System.out.println(匹配(数据,样式1));
    的System.out.println(匹配(数据,式样2));
    的System.out.println(匹配(数据,pattern3));
}

静态INT比赛(INT []的数据,INT []模式)
{
    INT匹配= 0;
    INT I = 0;
    而(I< data.length)
    {
        比赛的isEqual =(数据,我,模式)?匹配+ 1:匹配;
        我++;
    }
    返回匹配;
}

静态的isEqual布尔(INT []一,诠释了startIndex,INT [] A2)
{
    若(a == A2)
    {
        返回true;
    }
    若(a == NULL || A2 == NULL)
    {
        返回false;
    }

    的for(int i = 0; I< a2.length;我++)
    {
        如果(一个[在startIndex + 1]!= A2 [I])
        {
            返回false;
        }
    }

    返回true;
}
 

解决方案

要找到最多的长度至少为k的序列重复的长度为n的字符串,你可以建立一个后缀树(的 http://en.wikipedia.org/wiki/Suffix_tree )的线性时间,然后找到具有节点大多数儿童描述长度为k(或更多)的序列,从根

总的来说,这是在输入的字符串的长度线性时间

有关小k,你最好用天真的算法:

 从收藏导入计数器

高清输入(S,K):
    C =计数器()
    对于i中的xrange(LEN(S) -  k)的:
        C [S [I:I + K] + = 1
    返回c.most_common(1)[0] [0]


对于k在的xrange(1,4):
    打印输入('1177783777297461145777267337774652113777236237118777',K)
 

Imagine you have a list of numbers (or letters), such as

1177783777297461145777267337774652113777236237118777

I want to find the most frequent combinations of numbers in this list:

for 1-digit-long combinations - it is the most frequent number in this list

for 2-digit-long combinations - probably '11'

for 3-digits-long combinations - probably '777' etc

Is there some special algorythm for such a tasks?

UPDATE Well, by myself I have coded the following (Java). Looks like execution time is proportional to data size multiplied by pattern size:

public static void main(String[] args)
{
    int DATA_SIZE = 10000;
    int[] data = new int[DATA_SIZE];
    for (int i = 0; i < DATA_SIZE; i++)
    {
        data[i] = (int) (10 * Math.random()) % 10;
        System.out.print(data[i]);
    }

    int[] pattern1 = new int[]{1, 2, 3};
    int[] pattern2 = new int[]{7, 7, 7};
    int[] pattern3 = new int[]{7, 7};

    System.out.println();
    System.out.println(match(data, pattern1));
    System.out.println(match(data, pattern2));
    System.out.println(match(data, pattern3));
}

static int match(int[] data, int[] pattern)
{
    int matches = 0;
    int i = 0;
    while (i < data.length)
    {
        matches = isEqual(data, i, pattern) ? matches + 1 : matches;
        i++;
    }
    return matches;
}

static boolean isEqual(int[] a, int startIndex, int[] a2)
{
    if (a == a2)
    {
        return true;
    }
    if (a == null || a2 == null)
    {
        return false;
    }

    for (int i = 0; i < a2.length; i++)
    {
        if (a[startIndex + i] != a2[i])
        {
            return false;
        }
    }

    return true;
}

解决方案

To find the largest number of repeats of a sequence of length at least k in a string of length n, you can build a suffix tree (http://en.wikipedia.org/wiki/Suffix_tree) in linear time, then find the node that has the most children that describes sequences of length k (or more) from the root.

Overall, this is linear time in the length of the input string.

For small k, you're better off with a naive algorithm:

from collections import Counter

def input(s, k):
    c = Counter()
    for i in xrange(len(s) - k):
        c[s[i : i + k]] += 1
    return c.most_common(1)[0][0]


for k in xrange(1, 4):
    print input('1177783777297461145777267337774652113777236237118777', k)

这篇关于我怎样才能找到号码的列表中最常见的组合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆