在数组中寻找最小(但跳过某些元素)的CUDA减少使用 [英] Finding the minimum in an array (but skipping some elements) using reduction in CUDA

查看:410
本文介绍了在数组中寻找最小(但跳过某些元素)的CUDA减少使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大阵浮点数,我想找出数组的最小值(忽略 1 取值只要present),以及作为其指标,使用在CUDA减少。我写了下面的code要做到这一点,这在我看来应该工作:

I have a large array of floating point numbers and I want to find out the minimum value of the array (ignoring -1s wherever present) as well as its index, using reduction in CUDA. I have written the following code to do this, which in my opinion should work:

 __global__ void get_min_cost(float *d_Cost,int n,int *last_block_number,int *number_in_last_block,int *d_index){
     int tid = threadIdx.x;
     int myid = blockDim.x * blockIdx.x + threadIdx.x;
     int s;

     if(result == (*last_block_number)-1){
         s = (*number_in_last_block)/2;
     }else{
         s = 1024/2;
     }

     for(;s>0;s/=2){
         if(myid+s>=n)
             continue;
         if(tid<s){
             if(d_Cost[myid+s] == -1){
                 continue;
             }else if(d_Cost[myid] == -1 && d_Cost[myid+s] != -1){
                 d_Cost[myid] = d_Cost[myid+s];
                 d_index[myid] = d_index[myid+s];
             }else{
                 // both not -1
                 if(d_Cost[myid]<=d_Cost[myid+s])
                     continue;
                 else{
                     d_Cost[myid] = d_Cost[myid+s];
                     d_index[myid] = d_index[myid+s];
                 }
             }
         }
         else
             continue;
         __syncthreads();
     }
     if(tid==0){
         d_Cost[blockIdx.x] = d_Cost[myid];
         d_index[blockIdx.x] = d_index[myid];
     }
     return;
 }

last_block_number 参数是最后一个块的ID,而 number_in_last_block 是最后一个块元素的数量(这是一个功率 2 )。因此,所有的块将推出 1024 线程每次和最后一个块将仅使用 number_in_last_block 线程,而其他人将使用 1024 线程。

The last_block_number argument is the id of the last block, and number_in_last_block is the number of elements in last block (which is a power of 2). Thus, all blocks will launch 1024 threads every time and the last block will only use number_in_last_block threads, while others will use 1024 threads.

在这个函数运行,我期望的最低值,每个块是在 d_Cost [blockIdx.x] 及其 d_index指数[ blockIdx.x]

After this function runs, I expect the minimum values for each block to be in d_Cost[blockIdx.x] and their indices in d_index[blockIdx.x].

我多次调用此函数,每次更新线程和块的数量。我第二次调用此函数,线程的数量现在变得等于其余等块数

I call this function multiple times, each time updating the number of threads and blocks. The second time I call this function, the number of threads now become equal to the number of blocks remaining etc.

然而,上述功能不给我所需的输出。事实上,它提供了不同的输出每次运行程序时,即,它会返回一个不正确的值作为一些中间迭代中的最小(尽管不正确的值非常接近,每次最低)。

However, the above function isn't giving me the desired output. In fact, it gives a different output every time I run the program, i.e, it returns an incorrect value as the minimum during some intermediate iteration (though that incorrect value is quite close to the minimum every time).

我在做什么错在这里?

推荐答案

正如我在评论中提及上面的,我会建议,以避免自己的写作削减和使用CUDA推力只要有可能。这甚至在当你需要自定义这些操作的情况下也是如此,定制由是正确的可能超载,例如,关系运算。

As I mentioned in my comment above, I would recommend to avoid writing reductions of your own and use CUDA Thrust whenever possible. This holds true even in the case when you need to customize those operations, the customization being possible by properly overloading, e.g., relational operations.

在下面,我提供了一个简单的code数组中的最小值,其指标评价一起。它是基于包含在的简介推力 presentation。唯一增加的跳绳,因为你的要求,从计数 1 的。这可以通过更换合理地完成所有的数组中由 INT_MAX ,即最大重 1 根据IEEE浮点标准presentable整数。

Below I'm providing a simple code to evaluate the minimum in an array along with its index. It is based on a classical example contained in the An Introduction to Thrust presentation. The only addition is skipping, as you requested, the -1's from the counting. This can be reasonably done by replacing all the -1's in the array by INT_MAX, i.e., the maximum representable integer according to IEEE floating point standards.

#include <thrust\device_vector.h>
#include <thrust\replace.h>
#include <thrust\sequence.h>
#include <thrust\reduce.h>
#include <thrust\iterator\zip_iterator.h>
#include <thrust\tuple.h>

// --- Struct returning the smallest of two tuples
struct smaller_tuple
{
    __host__ __device__ thrust::tuple<int,int> operator()(thrust::tuple<int,int> a, thrust::tuple<int,int> b)
    {
        if (a < b)
            return a;
        else
            return b;
    }
};


void main() {

    const int N = 20;
    const int large_value = INT_MAX;

    // --- Setting the data vector
    thrust::device_vector<int> d_vec(N,10);
    d_vec[3] = -1; d_vec[5] = -2;

    // --- Copying the data vector to a new vector where the -1's are changed to FLT_MAX
    thrust::device_vector<int> d_vec_temp(d_vec);
    thrust::replace(d_vec_temp.begin(), d_vec_temp.end(), -1, large_value);

    // --- Creating the index sequence [0, 1, 2, ... )
    thrust::device_vector<int> indices(d_vec_temp.size());
    thrust::sequence(indices.begin(), indices.end());

    // --- Setting the initial value of the search
    thrust::tuple<int,int> init(d_vec_temp[0],0);

    thrust::tuple<int,int> smallest;
    smallest = thrust::reduce(thrust::make_zip_iterator(thrust::make_tuple(d_vec_temp.begin(), indices.begin())),
                          thrust::make_zip_iterator(thrust::make_tuple(d_vec_temp.end(), indices.end())),
                          init, smaller_tuple());

    printf("Smallest %i %i\n",thrust::get<0>(smallest),thrust::get<1>(smallest));
    getchar();
}

这篇关于在数组中寻找最小(但跳过某些元素)的CUDA减少使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆