算法找到"最常见的元素"在不同的阵列 [英] Algorithm to find "most common elements" in different arrays

查看:106
本文介绍了算法找到"最常见的元素"在不同的阵列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有,例如5阵列插入一些元素(数字):

I have for example 5 arrays with some inserted elements (numbers):

1, 4 ,8.10
1,2,3, 4 中,11,15
2, 4 中,20,21
2 ,30

1,4,8,10
1,2,3,4,11,15
2,4,20,21
2,30

我需要找到最常见的元素在那些阵列,每一个元素应该走了一路,直到结束(见下面的例子)。在这个例子中,这将是粗体组合(或在端相同的一个,但与30,它是相同的),因为它包含的最小数不同元件(只有两个,4和三十零分之二)。

I need to find most common elements in those arrays and every element should go all the way till the end (see example below). In this example that would be the bold combination (or the same one but with "30" on the end, it's the "same") because it contains the smallest number of different elements (only two, 4 and 2/30).

该组合(见下文)是不好的,因为如果我有恩。 4就必须走出去,直到它结束(下一个数组不能包含4的话)。因此,组合必须走一路走到最后。

This combination (see below) isn't good because if I have for ex. "4" it must "go" till it ends (next array mustn't contain "4" at all). So combination must go all the way till the end.

1, 4 ,8.10
1, 2 中,3,4,11,15
2 中,4,20,21
2 ,30

1,4,8,10
1,2,3,4,11,15
2,4,20,21
2,30

EDIT2:或

1, 4 ,8.10
1,2,3, 4 中,11,15
2 中,4,20,21
2 ,30

1,4,8,10
1,2,3,4,11,15
2,4,20,21
2,30

或其他任何东西是不好的。

OR anything else is NOT good.

有一些算法来加速这一事情了(如果我有成千上万阵列与数百名在每一个元素)?

Is there some algorithm to speed this thing up (if I have thousands of arrays with hundreds of elements in each one)?

要弄清楚 - 解决方案必须包含最低的许多不同的元素和(同一号码)的组都必须从第一组合 - 较大的最后一个 - 最小的国家。因此,在上面的示例4,4,4,2是更好然后4,2,2,2,因为在第一个例子中的组4的比组2的较大。

To make it clear - solution must contain lowest number of different elements and the groups (of the same numbers) must be grouped from first - larger ones to the last - smallest ones. So in upper example 4,4,4,2 is better then 4,2,2,2 because in first example group of 4's is larger than group of 2's.

编辑:为了更加具体。溶液必须包含在最小数目不同元素的的那些元素必须的从第一分组,以持续的。所以,如果我有三个arrrays像

To be more specific. Solution must contain the smallest number of different elements and those elements must be grouped from first to last. So if I have three arrrays like

1,2,3
1,4,5
4,5,6

1,2,3
1,4,5
4,5,6

解决方案是1,1,4-或1,1,5或1,1,6 NOT -2,5,5-因为:1的比2的(只有一个)大组(两个)。

Solution is 1,1,4 or 1,1,5 or 1,1,6 NOT 2,5,5 because 1's have larger group (two of them) than 2's (only one).

感谢。

EDIT3:我不能更具体的:(

I can't be more specific :(

EDIT4:@spintheblack 1,1,1,2,4是因为使用人数第一次(假设在位置1)以后无法使用(除非它在1的同组),正确的解决方案。我要说的是分组有优先?另外,我没有提到它(抱歉),但在阵列中的数量以任何方式不排序,我输入这种方式在这个职位,因为这是我更容易理解。

@spintheblack 1,1,1,2,4 is the correct solution because number used first time (let's say at position 1) can't be used later (except it's in the SAME group of 1's). I would say that grouping has the "priority"? Also, I didn't mention it (sorry about that) but the numbers in arrays are NOT sorted in any way, I typed it that way in this post because it was easier for me to follow.

推荐答案

我假设不同元素不必实际上是不同的,它们可以在最终解决重复。也就是说,如果presented与 [1],[2],[1] 的答案显然 [1,2,1] 是允许的。但是,我们会算这个具有3个不同的元素。

I am assuming that "distinct elements" do not have to actually be distinct, they can repeat in the final solution. That is if presented with [1], [2], [1] that the obvious answer [1, 2, 1] is allowed. But we'd count this as having 3 distinct elements.

如果是这样,那么在这里是一个Python的解决方案:

If so, then here is a Python solution:

def find_best_run (first_array, *argv):
    # initialize data structures.
    this_array_best_run = {}
    for x in first_array:
        this_array_best_run[x] = (1, (1,), (x,))

    for this_array in argv:
        # find the best runs ending at each value in this_array
        last_array_best_run = this_array_best_run
        this_array_best_run = {}

        for x in this_array:
            for (y, pattern) in last_array_best_run.iteritems():
                (distinct_count, lengths, elements) = pattern
                if x == y:
                    lengths = tuple(lengths[:-1] + (lengths[-1] + 1,))
                else :
                    distinct_count += 1
                    lengths = tuple(lengths + (1,))
                    elements = tuple(elements + (x,))

                if x not in this_array_best_run:
                    this_array_best_run[x] = (distinct_count, lengths, elements)
                else:
                    (prev_count, prev_lengths, prev_elements) = this_array_best_run[x]
                    if distinct_count < prev_count or prev_lengths < lengths:
                        this_array_best_run[x] = (distinct_count, lengths, elements)

    # find the best overall run
    best_count = len(argv) + 10 # Needs to be bigger than any possible answer.
    for (distinct_count, lengths, elements) in this_array_best_run.itervalues():
        if distinct_count < best_count:
            best_count = distinct_count
            best_lengths = lengths
            best_elements = elements
        elif distinct_count == best_count and best_lengths < lengths:
            best_count = distinct_count
            best_lengths = lengths
            best_elements = elements

    # convert it into a more normal representation.                
    answer = []
    for (length, element) in zip(best_lengths, elements):
        answer.extend([element] * length)

    return answer

# example
print find_best_run(
    [1,4,8,10],
    [1,2,3,4,11,15],
    [2,4,20,21],
    [2,30]) # prints [4, 4, 4, 30]

下面是一个解释。该 ... this_run 词典有按键,当前的数组中的元素,它们有它们的元组(distinct_count,长度,元素)的值。我们正试图尽量减少distinct_count,然后最大化的长度(长度是一个元组,因此这将preFER在第一点中最大值的元素),并跟踪到底的元素。在每一步我构建的所有可能这是一个跑起来到previous阵列,按顺序下一个该元素的组合运行,并找出哪些是最好的电流。当我到最后我选择了最好的整体运行,然后把它变成一个传统的重presentation并返回。

Here is an explanation. The ...this_run dictionaries have keys which are elements in the current array, and they have values which are tuples (distinct_count, lengths, elements). We are trying to minimize distinct_count, then maximize lengths (lengths is a tuple, so this will prefer the element with the largest value in the first spot) and are tracking elements for the end. At each step I construct all possible runs which are a combination of a run up to the previous array with this element next in sequence, and find which ones are best to the current. When I get to the end I pick the best possible overall run, then turn it into a conventional representation and return it.

如果你有 N 长度的数组 M ,这应该采取 O(N * M * M)的时间来运行。

If you have N arrays of length M, this should take O(N*M*M) time to run.

这篇关于算法找到&QUOT;最常见的元素&QUOT;在不同的阵列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆