使用cuda编程进行奇偶排序 [英] Odd-Even Sort using cuda Programming

查看:233
本文介绍了使用cuda编程进行奇偶排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用cuda-c语言实现奇偶排序程序。但是,每当我将0作为输入数组中的元素之一时,结果数组就不会正确排序,但是在其他情况下,它可以用于其他输入,我不明白代码的问题是什么这是我的代码:

I'm trying to implement odd-even sort program in cuda-c language. But, whenever I give a 0 as one of the elements in the input array, the resulted array is not properly sorted.In other cases, however, it is working for other input.I don't understand what is the problem with the code.Here is my code:

#include<stdio.h>
#include<cuda.h>
#define N 5

__global__ void sort(int *c,int *count)
{
    int l;
    if(*count%2==0)
          l=*count/2;
    else
         l=(*count/2)+1;
    for(int i=0;i<l;i++)
    {
            if(threadIdx.x%2==0)  //even phase
            {
                if(c[threadIdx.x]>c[threadIdx.x+1])
                {
                    int temp=c[threadIdx.x];
                    c[threadIdx.x]=c[threadIdx.x+1];
                    c[threadIdx.x+1]=temp;
                }

            __syncthreads();
            }
            else     //odd phase
            {
                if(c[threadIdx.x]>c[threadIdx.x+1])
                {
                    int temp=c[threadIdx.x];
                    c[threadIdx.x]=c[threadIdx.x+1];
                    c[threadIdx.x+1]=temp;
                }

            __syncthreads();
            }
    }//for

}



int main()
{int a[N],b[N],n;
    printf("enter size of array");
    scanf("%d",&n);
    print("enter the elements of array");
  for(int i=0;i<n;i++)
  {
    scanf("%d",&a[i]);
  }
  printf("ORIGINAL ARRAY : \n");
  for(int i=0;i<n;i++)
          {

          printf("%d ",a[i]);
          }
  int *c,*count;
  cudaMalloc((void**)&c,sizeof(int)*N);
  cudaMalloc((void**)&count,sizeof(int));
  cudaMemcpy(c,&a,sizeof(int)*N,cudaMemcpyHostToDevice);
  cudaMemcpy(count,&n,sizeof(int),cudaMemcpyHostToDevice);
  sort<<< 1,n >>>(c,count);
  cudaMemcpy(&b,c,sizeof(int)*N,cudaMemcpyDeviceToHost);
  printf("\nSORTED ARRAY : \n");
  for(int i=1;i<=n;i++)
      {
         printf("%d ",b[i]);
      }

}


推荐答案

您的内核代码有两个主要错误,我可以看到:

Your kernel code had two main errors that I could see:


  1. 在奇数阶段(对于偶数长度数组,甚至是奇数长度数组的相位),最后一个线程的索引将超出 c [threadIdx.x + 1] 的范围。例如,对于4个线程,它们的编号为0、1、2、3。线程3很奇怪,但是如果您访问 c [3 + 1] ,则该数组中未定义元素。我们可以通过限制每个阶段在除最后一个线程之外的所有线程上工作来解决此问题。

  1. On the odd phase (for even length array, or even phase for odd length array), your last thread will index out of bounds at c[threadIdx.x+1]. For example, for 4 threads, they are numbered 0,1,2,3. Thread 3 is odd, but if you access c[3+1], that is not a defined element in your array. We can fix this by restricting each phase to work on all threads but the last one.

您正在使用 __ syncthreads()在条件语句中,该条件语句将不允许所有线程到达障碍。这是一个编码错误。阅读文档。我们可以通过调整条件区域内的代码来解决此问题。

You were using __syncthreads() inside a conditional statement that would not allow all threads to reach the barrier. This is a coding error. Read the documentation. We can fix this by adjusting what code is inside the conditional regions.

在主代码中,您的最终打印输出语句错误地建立了索引:

In the main code, your final printout statements were indexing incorrectly:

for(int i=1;i<=n;i++)

应为:

for(int i=0;i<n;i++)

您在这里也有错字:

print("enter the elements of array");

我认为应该是 printf

以下代码已修复上述错误,并且对于长度最大为5的数组(对 N的硬编码限制对我来说似乎正确运行了)。即使您增加了 N ,我不确定这是否会超出扭曲的大小,并且肯定不会超出线程块的大小,但是希望您知道这一点。 (如果没有,请阅读有关以下内容的文档链接: __syncthreads())。

The following code has the above errors fixed, and seems to run correctly for me for arrays up to length 5 (your hardcoded limit on N). Even if you increased N, I'm not sure this would work beyond the size of a warp and certainly would not work beyond the threadblock size, but hopefully you are aware of that already(if not, read the doc link about __syncthreads()).

固定代码:

#include<stdio.h>
#include<cuda.h>
#define N 5

#define intswap(A,B) {int temp=A;A=B;B=temp;}

__global__ void sort(int *c,int *count)
{
    int l;
    if(*count%2==0)
          l=*count/2;
    else
         l=(*count/2)+1;
    for(int i=0;i<l;i++)
    {
            if((!(threadIdx.x&1)) && (threadIdx.x<(*count-1)))  //even phase
            {
                if(c[threadIdx.x]>c[threadIdx.x+1])
                  intswap(c[threadIdx.x], c[threadIdx.x+1]);
            }

            __syncthreads();
            if((threadIdx.x&1) && (threadIdx.x<(*count-1)))     //odd phase
            {
                if(c[threadIdx.x]>c[threadIdx.x+1])
                  intswap(c[threadIdx.x], c[threadIdx.x+1]);
            }
            __syncthreads();
    }//for

}



int main()
{int a[N],b[N],n;
    printf("enter size of array");
    scanf("%d",&n);
    if (n > N) {printf("too large!\n"); return 1;}
    printf("enter the elements of array");
  for(int i=0;i<n;i++)
  {
    scanf("%d",&a[i]);
  }
  printf("ORIGINAL ARRAY : \n");
  for(int i=0;i<n;i++)
          {

          printf("%d ",a[i]);
          }
  int *c,*count;
  cudaMalloc((void**)&c,sizeof(int)*N);
  cudaMalloc((void**)&count,sizeof(int));
  cudaMemcpy(c,&a,sizeof(int)*N,cudaMemcpyHostToDevice);
  cudaMemcpy(count,&n,sizeof(int),cudaMemcpyHostToDevice);
  sort<<< 1,n >>>(c,count);
  cudaMemcpy(&b,c,sizeof(int)*N,cudaMemcpyDeviceToHost);
  printf("\nSORTED ARRAY : \n");
  for(int i=0;i<n;i++)
      {
         printf("%d ",b[i]);
      }

  printf("\n");
}

关于正确的cuda错误检查属于什么规范方法来检查错误在这里。

这篇关于使用cuda编程进行奇偶排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆