这究竟是什么数组比较问题的最佳算法？ [英] What is the best algorithm for this array-comparison problem?

查看：270 发布时间：2015/11/30 16:10:38 c algorithm optimization cuda

本文介绍了这究竟是什么数组比较问题的最佳算法？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

什么是最有效的速算法来解决以下问题？

What is the most efficient for speed algorithm to solve the following problem?

鉴于6阵列，D1，D2，D3，D4，D5和D6各自含有6个数字，如：

Given 6 arrays, D1,D2,D3,D4,D5 and D6 each containing 6 numbers like:

D1[0] = number              D2[0] = number      ......       D6[0] = number
D1[1] = another number      D2[1] = another number           ....
.....                       ....                ......       ....
D1[5] = yet another number  ....                ......       ....

由于第二阵列ST1，包含1号：

Given a second array ST1, containing 1 number:

ST1[0] = 6

鉴于第三阵列ANS，包含6个号码：

Given a third array ans, containing 6 numbers:

ans[0] = 3, ans[1] = 4, ans[2] = 5, ......ans[5] = 8

使用作为索引的阵列D1，D2，D3，D4，D5和D6，即从0到存储在ST1 [0]减1，在此例6中的号码的数量，因此从0到6 -1，比较对每个维数组的数组答。结果应该是0，如果没有发现任何D的一个或多个答号码相同的指数，应该是1，如果找到了所有ANS数字在一些D中的同一个索引处。也就是说，返回0，如果一些ANS [I]不等于任何D N [i]和返回1，如果每答[我]等于一些ð N [我]。

Using as index for the arrays D1,D2,D3,D4,D5 and D6, the number that goes from 0, to the number stored in ST1[0] minus one, in this example 6, so from 0 to 6-1, compare the ans array against each D array. The result should be 0 if one or more ans numbers are not found in any D at the same index and should be 1 if all ans numbers are found in some D at the same index. That is, return 0 if some ans[i] doesn't equal any DN[i] and return 1 if every ans[i] equals some DN[i].

我的算法迄今：
我试图把一切都unlooped尽可能多地。

My algorithm so far is:
I tried to keep everything unlooped as much as possible.

EML  := ST1[0]   //number contained in ST1[0]   
EML1 := 0        //start index for the arrays D 

While EML1 < EML
   if D1[ELM1] = ans[0] 
     goto two
   if D2[ELM1] = ans[0] 
     goto two
   if D3[ELM1] = ans[0] 
     goto two
   if D4[ELM1] = ans[0] 
     goto two
   if D5[ELM1] = ans[0] 
     goto two
   if D6[ELM1] = ans[0] 
     goto two

   ELM1 = ELM1 + 1

return 0     //If the ans[0] number is not found in either D1[0-6], D2[0-6].... D6[0-6] return 0 which will then exclude ans[0-6] numbers


two:

EML1 := 0      start index for arrays Ds 
While EML1 < EML
   if D1[ELM1] = ans[1] 
     goto three
   if D2[ELM1] = ans[1] 
     goto three
   if D3[ELM1] = ans[1] 
     goto three
   if D4[ELM1] = ans[1] 
     goto three
   if D5[ELM1] = ans[1] 
     goto three
   if D6[ELM1] = ans[1] 
     goto three
   ELM1 = ELM1 + 1

return 0    //If the ans[1] number is not found in either D1[0-6], D2[0-6]....  D6[0-6]  return 0 which will then exclude ans[0-6] numbers

three:

EML1 := 0      start index for arrays Ds 

While EML1 < EML
   if D1[ELM1] = ans[2] 
     goto four
   if D2[ELM1] = ans[2] 
     goto four
   if D3[ELM1] = ans[2] 
     goto four
   if D4[ELM1] = ans[2] 
     goto four
   if D5[ELM1] = ans[2] 
     goto four
   if D6[ELM1] = ans[2] 
     goto four
   ELM1 = ELM1 + 1

return 0   //If the ans[2] number is not found in either D1[0-6], D2[0-6]....  D6[0-6]  return 0 which will then exclude ans[0-6] numbers

four:

EML1 := 0      start index for arrays Ds 

While EML1 < EML
   if D1[ELM1] = ans[3] 
     goto five
   if D2[ELM1] = ans[3] 
     goto five
   if D3[ELM1] = ans[3] 
     goto five
   if D4[ELM1] = ans[3] 
     goto five
   if D5[ELM1] = ans[3] 
     goto five
   if D6[ELM1] = ans[3] 
     goto five
   ELM1 = ELM1 + 1

return 0 //If the ans[3] number is not found in either D1[0-6], D2[0-6]....  D6[0-6]  return 0 which will then exclude ans[0-6] numbers


five:

EML1 := 0      start index for arrays Ds 

While EML1 < EML
   if D1[ELM1] = ans[4] 
     goto six
   if D2[ELM1] = ans[4] 
     goto six
   if D3[ELM1] = ans[4] 
     goto six
   if D4[ELM1] = ans[4] 
     goto six
   if D5[ELM1] = ans[4] 
     goto six
   if D6[ELM1] = ans[4] 
     goto six
   ELM1 = ELM1 + 1

return 0  //If the ans[4] number is not found in either D1[0-6], D2[0-6]....  D6[0-6]  return 0 which will then exclude ans[0-6] numbers

six:

EML1 := 0      start index for arrays Ds 

While EML1 < EML
   if D1[ELM1] = ans[5] 
     return 1            ////If the ans[1] number is not found in either D1[0-6].....  
   if D2[ELM1] = ans[5]      return 1 which will then include ans[0-6] numbers
     return 1
   if D3[ELM1] = ans[5] 
     return 1
   if D4[ELM1] = ans[5] 
     return 1
   if D5[ELM1] = ans[5] 
     return 1
   if D6[ELM1] = ans[5] 
     return 1
   ELM1 = ELM1 + 1

return 0

为首选语言，这将是纯C

As language of choice, it would be pure c

推荐答案

我做了一个直接的琐碎C实现由原始的海报中提供的算法。正是在这里

I did a direct trivial C implementation of the algorithm provided by the original poster. It is here

至于其他建议做的第一件事是卷起code。展开是不求速度真的好，因为它会导致code高速缓存未命中。我开始滚动内部循环，并获得这个。然后，我滚了外循环，并删除了现在没有用了goto方法，得到了code以下。

As other proposed the first thing to do is to roll up the code. Unrolling is not really good for speed as it lead to code cache misses. I began by rolling inner loops and got this. Then I rolled the outer loop and removed the now useless gotos and got code below.

修改：我换了好几时间C code，因为即使是简单的，因为它是似乎有当JIT编译或执行它的 CUDA （和CUDA似乎不是非常详细有关错误）。这就是为什么下面用全局的一件code ...这仅仅是简单的实现。我们还没有打算速度。这充分表明，premature优化。何苦让它快，如果我们甚至不能使它发挥作用？我估计仍有问题CUDA似乎强加于$ C $诸多限制C，可让工作，如果我相信维基百科的文章。同时，也许我们应该使用，而不是INT浮？

EDIT: I changed several time the C code because even as simple as it is there seems to be problems when JIT compiling or executing it with CUDA (and CUDA seems not to be very verbose about errors). That is why the piece of code below use globals... and that is just trivial implementation. We are not yet going for speed. It says much about premature optimization. Why bother to make it fast if we can't even make it work ? I guess there is still issues as CUDA seems to impose many restrictions on the code you can make work if I believe the Wikipedia article. Also maybe we should use float instead of int ?

#include <stdio.h>

int D1[6] = {3, 4, 5, 6, 7, 8};
int D2[6] = {3, 4, 5, 6, 7, 8};
int D3[6] = {3, 4, 5, 6, 7, 8};
int D4[6] = {3, 4, 5, 6, 7, 8};
int D5[6] = {3, 4, 5, 6, 7, 8};
int D6[6] = {3, 4, 5, 6, 7, 9};
int ST1[1] = {6};
int ans[6] = {1, 4, 5, 6, 7, 9};
int * D[6] = { D1, D2, D3, D4, D5, D6 };

/* beware D is passed through globals */
int algo(int * ans, int ELM){
    int a, e, p;

    for (a = 0 ; a < 6 ; a++){ 
        for (e = 0 ; e < ELM ; e++){
            for (p = 0 ; p < 6 ; p++){
                if (D[p][e] == ans[a]){
                    goto cont;
                }
            }
        }
        return 0; //bad row of numbers found
    cont:;
    }
    return 1;
}

int main(){
    int res;
    res = algo(ans, ST1[0]);
    printf("algo returned %d\n", res);
}

现在是有趣的，因为我们可以理解code在做什么。顺便说一句这样的包装工作，我在原来的问题纠正一些怪事。我相信这是错别字，因为它是不符合逻辑都在全球范围内。 - GOTO总是跳转到两（它应该有进步） - 最后的测试检查ANS [0]，而不是答[5]

Now that is interesting, because we can understand what code is doing. By the way doing this packing job I corrected several oddities in the original question. I believe it was typos as it wasn't logical at all in the global context. - goto always jump to two (it should have progressed) - the last test checked ans[0] instead of ans[5]

请马克，纠正我，如果我错了，在什么原来的code应该做的和你原来的算法是错字自由上述asumptions。

please Mark, correct me if I'm wrong in the above asumptions on what the original code should do and your original algorithm is typo free.

什么是code是做什么的每个值在ANS检查是present在两维数组。如果有任何数量错过了返回0。如果找到了所有的数字则返回1。

What the code is doing it for each value in ans check it is present in a two dimentional array. If any number miss it returns 0. If all numbers are found it returns 1.

我会做才能得到真正的快速code是不是在C，但像巨蟒（或C ++），另一种语言，其中一套是标准库提供的基本数据结构来实现它。然后，我将建立一套与阵列中的所有值（即是O（n）），并检查是否搜查数字是present在设置与否（即O（1））。最后执行应比现有code。至少更快但从一个算法点

What I would do to get a real fast code is not to implement it in C but in another language like python (or C++) where set is a basic data structure provided by standard libraries. Then I would build a set with all the values of the arrays (that is O(n)) and check if numbers searched are present in set or not (that is O(1)). The final implementation should be faster than the existing code at least from an algorithmic point of view.

Python的例子是，因为它实在是微不足道的（打印真/假，而不是1/0，但你的想法）：

Python example is below as it is really trivial (print true/false instead of 1/0 but you get the idea):

ans_set = set(ans)
print len(set(D1+D2+D3+D4+D5+D6).intersection(ans_set)) == len(ans_set)

下面是使用几组可能的C ++实现：

Here is a possible C++ implementation using sets:

#include <iostream>
#include <set>

int algo(int * D1, int * D2, int * D3, int * D4, int * D5, int * D6, int * ans, int ELM){
    int e, p;
    int * D[6] = { D1, D2, D3, D4, D5, D6 };
    std::set<int> ans_set(ans, ans+6);

    int lg = ans_set.size();

    for (e = 0 ; e < ELM ; e++){
        for (p = 0 ; p < 6 ; p++){
            if (0 == (lg -= ans_set.erase(D[p][e]))){
                // we found all elements of ans_set
                return 1;
            }
        }
    }
    return 0; // some items in ans are missing
}

int main(){
    int D1[6] = {3, 4, 5, 6, 7, 8};
    int D2[6] = {3, 4, 5, 6, 7, 8};
    int D3[6] = {3, 4, 5, 6, 7, 8};
    int D4[6] = {3, 4, 5, 6, 7, 8};
    int D5[6] = {3, 4, 5, 6, 7, 8};
    int D6[6] = {3, 4, 5, 6, 7, 1};

    int ST1[1] = {6};

    int ans[] = {1, 4, 5, 6, 7, 8};

    int res = algo(D1, D2, D3, D4, D5, D6, ans, ST1[0]);
    std::cout << "algo returned " << res << "\n";
}

我们做一些性能的假设：答的内容应进行排序，还是应该以其他方式构造它，我们假设D1..D6的内容调用到算法中的改变。因此，我们不要打扰构建了一套针对IT行业（如建筑是O（n），反正我们也不会得到什么，如果D1..D6正在发生变化）。但是，如果我们称之为多次算法中使用相同的D1..D6，这是ANS是改变了我们应该做的相反，改造D1..D6成一个更大的集合，我们保持可用。

We make some performance hypothesis : the content of ans should be sorted or we should construct it otherwise, we suppose that content of D1..D6 will change between calls to algo. Hence we do not bother constructing a set for it (as set construction is O(n) anyway we wouldn't gain anything if D1..D6 are changing). But if we call several times algo with the same D1..D6 and that is ans that change we should do the opposite and transform D1..D6 into one larger set that we keep available.

如果我坚持到C我能做到这一点如下：

If I stick to C I could do it as follow:

复制所有号码D1 ... D6在一个独特的阵列（采用的memcpy的每一行）
在这个数组排序含量
使用一个二分搜索来检查，如果号码（如有）

随着数据的大小是非常小的在这里，我们也可以尝试去为微型优化。它可以在这里多加。不知道。

As data size are really small here , we could also try going for micro optimizations. It could pay better here. Don't know for sure.

EDIT2 ：有基于C的CUDA所支持的子集硬限制。最严格的是，我们不应该使用指针到主内存。将不得不加以考虑。这解释了为什么当前code不起作用。最简单的变化可能是调用它又将每个数组D1..D6。为了简化，避免函数调用的成本，我们可以使用宏或内联函数。我会后的提议。

EDIT2: there is hard restrictions on the subset of C supported by CUDA. The most restrictive one is that we shouldn't use pointers to main memory. That will have to be taken into account. It explains why the current code does not work. The simplest change is probably to call it for every array D1..D6 in turn. To keep it short and avoid function call cost we may use a macro or an inline function. I will post a proposal.

这篇关于这究竟是什么数组比较问题的最佳算法？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

这究竟是什么数组比较问题的最佳算法？ [英] What is the best algorithm for this array-comparison problem?

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

这究竟是什么数组比较问题的最佳算法？ [英] What is the best algorithm for this array-comparison problem?

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭