而在CUDA内核中循环失败 [英] While loop fails in CUDA kernel

查看：165 发布时间：2017/3/4 16:27:25 c++ cuda

本文介绍了而在CUDA内核中循环失败的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用GPU做一些计算处理字。
最初，我使用一个块（有500个线程）来处理一个字。
要处理100个单词，我必须在我的主函数中循环内核函数100次。

I am using GPU to do some calculation for processing words. Initially, I used one block (with 500 threads) to process one word. To process 100 words, I have to loop the kernel function 100 times in my main function.

for (int i=0; i<100; i++)
    kernel <<< 1, 500 >>> (length_of_word);

我的内核函数看起来像这样：

My kernel function looks like this:

__global__ void kernel (int *dev_length)
{
   int length = *dev_length;
   while (length > 4)
   {   //do something;
          length -=4;
   }
}

现在我想处理所有100个字时间。

Now I want to process all 100 words at the same time.

每个块仍将有500个线程，并处理一个字（每个块）。

Each block will still have 500 threads, and processes one word (per block).

dev_totalwordarray：store所有字符的字符（一个接一个）

dev_totalwordarray: store all characters of the words (one after another)

dev_length_array：存储每个字的长度。

dev_length_array: store the length of each word.

dev_accu_length ：存储字的累积长度（所有以前字的总字符）

dev_accu_length: stores the accumulative length of the word (total char of all previous words)

dev_salt_是大小为500的数组，存储无符号整数。

dev_salt_ is an array of of size 500, storing unsigned integers.

因此，在我的主要函数中，我有

Hence, in my main function I have

   kernel2 <<< 100, 500 >>> (dev_totalwordarray, dev_length_array, dev_accu_length, dev_salt_);

填充cpu数组：

    for (int i=0; i<wordnumber; i++)
    {
        int length=0;
        while (word_list_ptr_array[i][length]!=0)
        {
            length++;
        }

        actualwordlength2[i] = length;
    }

要从cpu - > gpu：

to copy from cpu -> gpu:

    int* dev_array_of_word_length;
    HANDLE_ERROR( cudaMalloc( (void**)&dev_array_of_word_length, 100 * sizeof(int) ) );
    HANDLE_ERROR( cudaMemcpy( dev_array_of_word_length, actualwordlength2, 100 * sizeof(int),

kernel现在看起来像这样：

My function kernel now looks like this:

__global__ void kernel2 (char* dev_totalwordarray, int *dev_length_array, int* dev_accu_length, unsigned int* dev_salt_)
{

  tid = threadIdx.x + blockIdx.x * blockDim.x;
  unsigned int hash[N];

  int length = dev_length_array[blockIdx.x];

   while (tid < 50000)
   {
        const char* itr = &(dev_totalwordarray[dev_accu_length[blockIdx.x]]);
        hash[tid] = dev_salt_[threadIdx.x];
        unsigned int loop = 0;

        while (length > 4)
        {   const unsigned int& i1 = *(reinterpret_cast<const unsigned int*>(itr)); itr += sizeof(unsigned int);
            const unsigned int& i2 = *(reinterpret_cast<const unsigned int*>(itr)); itr += sizeof(unsigned int);
            hash[tid] ^= (hash[tid] <<  7) ^  i1 * (hash[tid] >> 3) ^ (~((hash[tid] << 11) + (i2 ^ (hash[tid] >> 5))));
            length -=4;
        }
        tid += blockDim.x * gridDim.x;
   }
}

然而，kernel2似乎不工作。

However, kernel2 doesn't seem to work at all.

看起来 while（length> 4）会导致此情况。

有没有人知道为什么？感谢。

Does anyone know why? Thanks.

而在CUDA内核中循环失败 [英] While loop fails in CUDA kernel

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

而在CUDA内核中循环失败 [英] While loop fails in CUDA kernel

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭