这个数组比较问题的最佳算法是什么？ [英] What is the best algorithm for this array-comparison problem?

查看：307 发布时间：2017/3/4 14:16:24 c algorithm optimization cuda

本文介绍了这个数组比较问题的最佳算法是什么？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

为速度算法解决以下问题最有效的是什么？

What is the most efficient for speed algorithm to solve the following problem?

给定6个数组，D1，D2，D3，D4，D5和D6数字如：

Given 6 arrays, D1,D2,D3,D4,D5 and D6 each containing 6 numbers like:

D1[0] = number              D2[0] = number      ......       D6[0] = number
D1[1] = another number      D2[1] = another number           ....
.....                       ....                ......       ....
D1[5] = yet another number  ....                ......       ....

给定第二个数组ST1，包含1个数字：

Given a second array ST1, containing 1 number:

ST1[0] = 6

给定第三个数组ans，包含6个数字：

Given a third array ans, containing 6 numbers:

ans[0] = 3, ans[1] = 4, ans[2] = 5, ......ans[5] = 8

使用数组D1，D2，D3，D4，D5和D6的索引，从0到ST1 [0]减1，在本例中为6，所以从0到6-1，将ans数组与每个D数组进行比较。如果在同一索引处的任何D中没有找到一个或多个ans号，则结果应为0，并且如果在相同索引处的某个D中找到所有ans号，则结果应为1。也就是说，如果某个ans [i]不等于任何D N [i]则返回0，并且如果每个ans [i]等于某个D N [i] ]。

Using as index for the arrays D1,D2,D3,D4,D5 and D6, the number that goes from 0, to the number stored in ST1[0] minus one, in this example 6, so from 0 to 6-1, compare the ans array against each D array. The result should be 0 if one or more ans numbers are not found in any D at the same index and should be 1 if all ans numbers are found in some D at the same index. That is, return 0 if some ans[i] doesn't equal any DN[i] and return 1 if every ans[i] equals some DN[i].

我的算法到目前为止：

我试图尽可能地保持一切无限。

My algorithm so far is:
I tried to keep everything unlooped as much as possible.

EML  := ST1[0]   //number contained in ST1[0]   
EML1 := 0        //start index for the arrays D 

While EML1 < EML
   if D1[ELM1] = ans[0] 
     goto two
   if D2[ELM1] = ans[0] 
     goto two
   if D3[ELM1] = ans[0] 
     goto two
   if D4[ELM1] = ans[0] 
     goto two
   if D5[ELM1] = ans[0] 
     goto two
   if D6[ELM1] = ans[0] 
     goto two

   ELM1 = ELM1 + 1

return 0     //If the ans[0] number is not found in either D1[0-6], D2[0-6].... D6[0-6] return 0 which will then exclude ans[0-6] numbers


two:

EML1 := 0      start index for arrays Ds 
While EML1 < EML
   if D1[ELM1] = ans[1] 
     goto three
   if D2[ELM1] = ans[1] 
     goto three
   if D3[ELM1] = ans[1] 
     goto three
   if D4[ELM1] = ans[1] 
     goto three
   if D5[ELM1] = ans[1] 
     goto three
   if D6[ELM1] = ans[1] 
     goto three
   ELM1 = ELM1 + 1

return 0    //If the ans[1] number is not found in either D1[0-6], D2[0-6]....  D6[0-6]  return 0 which will then exclude ans[0-6] numbers

three:

EML1 := 0      start index for arrays Ds 

While EML1 < EML
   if D1[ELM1] = ans[2] 
     goto four
   if D2[ELM1] = ans[2] 
     goto four
   if D3[ELM1] = ans[2] 
     goto four
   if D4[ELM1] = ans[2] 
     goto four
   if D5[ELM1] = ans[2] 
     goto four
   if D6[ELM1] = ans[2] 
     goto four
   ELM1 = ELM1 + 1

return 0   //If the ans[2] number is not found in either D1[0-6], D2[0-6]....  D6[0-6]  return 0 which will then exclude ans[0-6] numbers

four:

EML1 := 0      start index for arrays Ds 

While EML1 < EML
   if D1[ELM1] = ans[3] 
     goto five
   if D2[ELM1] = ans[3] 
     goto five
   if D3[ELM1] = ans[3] 
     goto five
   if D4[ELM1] = ans[3] 
     goto five
   if D5[ELM1] = ans[3] 
     goto five
   if D6[ELM1] = ans[3] 
     goto five
   ELM1 = ELM1 + 1

return 0 //If the ans[3] number is not found in either D1[0-6], D2[0-6]....  D6[0-6]  return 0 which will then exclude ans[0-6] numbers


five:

EML1 := 0      start index for arrays Ds 

While EML1 < EML
   if D1[ELM1] = ans[4] 
     goto six
   if D2[ELM1] = ans[4] 
     goto six
   if D3[ELM1] = ans[4] 
     goto six
   if D4[ELM1] = ans[4] 
     goto six
   if D5[ELM1] = ans[4] 
     goto six
   if D6[ELM1] = ans[4] 
     goto six
   ELM1 = ELM1 + 1

return 0  //If the ans[4] number is not found in either D1[0-6], D2[0-6]....  D6[0-6]  return 0 which will then exclude ans[0-6] numbers

six:

EML1 := 0      start index for arrays Ds 

While EML1 < EML
   if D1[ELM1] = ans[5] 
     return 1            ////If the ans[1] number is not found in either D1[0-6].....  
   if D2[ELM1] = ans[5]      return 1 which will then include ans[0-6] numbers
     return 1
   if D3[ELM1] = ans[5] 
     return 1
   if D4[ELM1] = ans[5] 
     return 1
   if D5[ELM1] = ans[5] 
     return 1
   if D6[ELM1] = ans[5] 
     return 1
   ELM1 = ELM1 + 1

return 0

作为选择语言，它会是纯c

As language of choice, it would be pure c

推荐答案

实现原始海报提供的算法。这是这里

I did a direct trivial C implementation of the algorithm provided by the original poster. It is here

如其他提议的首先要做的是汇总代码。展开不是真的好的速度，因为它导致代码缓存未命中。我开始滚动内部循环，并获得这。

As other proposed the first thing to do is to roll up the code. Unrolling is not really good for speed as it lead to code cache misses. I began by rolling inner loops and got this. Then I rolled the outer loop and removed the now useless gotos and got code below.

EDIT ：我改变了几次C代码，因为即使是这样，这很简单，当JIT使用 CUDA 编译或执行它时，似乎存在问题CUDA似乎不是很详细的错误）。这就是为什么下面的代码使用全局...，这只是平凡的实现。我们还没有去速度。它说了很多关于过早优化。如果我们不能使它工作，为什么要麻烦使它快？我想仍然有一些问题，因为CUDA似乎对代码施加了许多限制，如果我相信维基百科的文章，你可以做工作。也许我们应该使用float而不是int？

EDIT: I changed several time the C code because even as simple as it is there seems to be problems when JIT compiling or executing it with CUDA (and CUDA seems not to be very verbose about errors). That is why the piece of code below use globals... and that is just trivial implementation. We are not yet going for speed. It says much about premature optimization. Why bother to make it fast if we can't even make it work ? I guess there is still issues as CUDA seems to impose many restrictions on the code you can make work if I believe the Wikipedia article. Also maybe we should use float instead of int ?

#include <stdio.h>

int D1[6] = {3, 4, 5, 6, 7, 8};
int D2[6] = {3, 4, 5, 6, 7, 8};
int D3[6] = {3, 4, 5, 6, 7, 8};
int D4[6] = {3, 4, 5, 6, 7, 8};
int D5[6] = {3, 4, 5, 6, 7, 8};
int D6[6] = {3, 4, 5, 6, 7, 9};
int ST1[1] = {6};
int ans[6] = {1, 4, 5, 6, 7, 9};
int * D[6] = { D1, D2, D3, D4, D5, D6 };

/* beware D is passed through globals */
int algo(int * ans, int ELM){
    int a, e, p;

    for (a = 0 ; a < 6 ; a++){ 
        for (e = 0 ; e < ELM ; e++){
            for (p = 0 ; p < 6 ; p++){
                if (D[p][e] == ans[a]){
                    goto cont;
                }
            }
        }
        return 0; //bad row of numbers found
    cont:;
    }
    return 1;
}

int main(){
    int res;
    res = algo(ans, ST1[0]);
    printf("algo returned %d\n", res);
}

现在这很有趣，因为我们可以理解代码在做什么。顺便说一下，这个包装工作我纠正了几个原始的问题。我相信这是错字，因为它在全球语境中是不合逻辑的。
- goto总是跳转到两个（它应该已经进展）
- 最后一次测试检查ans [0]而不是ans [5]

Now that is interesting, because we can understand what code is doing. By the way doing this packing job I corrected several oddities in the original question. I believe it was typos as it wasn't logical at all in the global context. - goto always jump to two (it should have progressed) - the last test checked ans[0] instead of ans[5]

请标记，如果我在上面的原则代码应该做什么和您的原始算法是错字的错误，请纠正我。

please Mark, correct me if I'm wrong in the above asumptions on what the original code should do and your original algorithm is typo free.

代码是什么对于ans检查中的每个值，它存在于二维数组中。如果任何数字未命中，它返回0.如果所有的数字都找到它返回1。

What the code is doing it for each value in ans check it is present in a two dimentional array. If any number miss it returns 0. If all numbers are found it returns 1.

我想做一个真正的快速代码不是在C中实现，在另一种语言像python（或C ++），其中set是标准库提供的基本数据结构。然后我将构建一个所有的数组值（即O（n））的集合，并检查搜索的数字是否存在集合（即O（1））。最后的实现应该比现有代码更快，至少从算法的角度来看。

What I would do to get a real fast code is not to implement it in C but in another language like python (or C++) where set is a basic data structure provided by standard libraries. Then I would build a set with all the values of the arrays (that is O(n)) and check if numbers searched are present in set or not (that is O(1)). The final implementation should be faster than the existing code at least from an algorithmic point of view.

Python示例如下，因为它是真正的琐碎（打印true / false而不是1/0但你得到的想法）：

Python example is below as it is really trivial (print true/false instead of 1/0 but you get the idea):

ans_set = set(ans)
print len(set(D1+D2+D3+D4+D5+D6).intersection(ans_set)) == len(ans_set)

这里是一个可能的C ++实现使用集合：

Here is a possible C++ implementation using sets:

#include <iostream>
#include <set>

int algo(int * D1, int * D2, int * D3, int * D4, int * D5, int * D6, int * ans, int ELM){
    int e, p;
    int * D[6] = { D1, D2, D3, D4, D5, D6 };
    std::set<int> ans_set(ans, ans+6);

    int lg = ans_set.size();

    for (e = 0 ; e < ELM ; e++){
        for (p = 0 ; p < 6 ; p++){
            if (0 == (lg -= ans_set.erase(D[p][e]))){
                // we found all elements of ans_set
                return 1;
            }
        }
    }
    return 0; // some items in ans are missing
}

int main(){
    int D1[6] = {3, 4, 5, 6, 7, 8};
    int D2[6] = {3, 4, 5, 6, 7, 8};
    int D3[6] = {3, 4, 5, 6, 7, 8};
    int D4[6] = {3, 4, 5, 6, 7, 8};
    int D5[6] = {3, 4, 5, 6, 7, 8};
    int D6[6] = {3, 4, 5, 6, 7, 1};

    int ST1[1] = {6};

    int ans[] = {1, 4, 5, 6, 7, 8};

    int res = algo(D1, D2, D3, D4, D5, D6, ans, ST1[0]);
    std::cout << "algo returned " << res << "\n";
}

我们做一些性能假设：ans的内容应该排序，否则，我们假设D1..D6的内容将在对算法的调用之间改变。因此，我们不打算为它构建一个集合（因为集合构造是O（n）无论如何，如果D1..D6改变，我们不会获得任何东西）。但是如果我们用相同的D1..D6调用几个算法，那么我们应该做相反的事情，并将D1..D6转换为一个更大的集合，我们保持可用。

We make some performance hypothesis : the content of ans should be sorted or we should construct it otherwise, we suppose that content of D1..D6 will change between calls to algo. Hence we do not bother constructing a set for it (as set construction is O(n) anyway we wouldn't gain anything if D1..D6 are changing). But if we call several times algo with the same D1..D6 and that is ans that change we should do the opposite and transform D1..D6 into one larger set that we keep available.

如果我坚持CI可以做到如下：

If I stick to C I could do it as follow:

复制D1 .. D6中的所有数字在一个独特的数组（使用每行的memcpy）

对此数组的内容排序

使用二分搜索检查数字是否可用

由于数据量在这里很小，我们也可以尝试微优化。它可以在这里更好。不确定。

As data size are really small here , we could also try going for micro optimizations. It could pay better here. Don't know for sure.

EDIT2 ：对CUDA支持的C子集有严格的限制。最严格的一个是，我们不应该使用指针到主内存。这必须加以考虑。它解释了为什么当前的代码不工作。最简单的改变可能是为每个数组D1..D6依次调用它。为了保持简短并避免函数调用成本，我们可以使用宏或内联函数。我将发布提案。

EDIT2: there is hard restrictions on the subset of C supported by CUDA. The most restrictive one is that we shouldn't use pointers to main memory. That will have to be taken into account. It explains why the current code does not work. The simplest change is probably to call it for every array D1..D6 in turn. To keep it short and avoid function call cost we may use a macro or an inline function. I will post a proposal.

这篇关于这个数组比较问题的最佳算法是什么？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

这个数组比较问题的最佳算法是什么？ [英] What is the best algorithm for this array-comparison problem?

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

这个数组比较问题的最佳算法是什么？ [英] What is the best algorithm for this array-comparison problem?

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭