推力 - 如何使用我的数组/数据 - 模型 [英] Thrust - How to use my array/data - model

查看:138
本文介绍了推力 - 如何使用我的数组/数据 - 模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新来的推力(CUDA),我想要做一些数组操作,但我不觉得在互联网上的任何类似的例子。

I am new to thrust (cuda) and I want to do some array operations but I don´t find any similar example on the internet.

我有以下两个数组(2D):

I have following two arrays (2d):

a = { {1, 2, 3}, {4} }
b = { {5}, {6, 7} }

我要那个推力计算这个数组:

I want that thrust compute this array:

c = { {1, 2, 3, 5}, {1, 2, 3, 6, 7}, {1, 2, 3, 5}, {1, 2, 3, 6, 7} }

我知道它是如何工作的C / C ++而不是怎么说的推力来做到这一点。

I know how it works in c/c++ but not how to say thrust to do it.

下面是我的想法如何沃尔也许可以工作:

Here is my idea how it wohl maybe could work:

主题1:
乘坐[0] - >用b展开。
它写入到c。

Thread 1: Take a[0] -> expand it with b. Write it to c.

主题2:
以一个[1] - >用b展开。
它写入到c。

Thread 2: Take a[1] -> expand it with b. Write it to c.

但我不知道该怎么做。我可以写数组a和b到一个一维数组,如:

But I have no idea how to do that. I could write the array a and b to an 1d array like:

thrust::device_vector<int> dev_a;
dev_a.push_back(3); // size of first array
dev_a.push_back(1);
dev_a.push_back(2);
dev_a.push_back(3);
dev_a.push_back(1); // size of secound array
dev_a.push_back(4);

thrust::device_vector<int> dev_b;
dev_b.push_back(1); // size of first array
dev_b.push_back(5);
dev_b.push_back(2); // size of secound array
dev_b.push_back(6);
dev_b.push_back(7); 

和伪功能:

struct expand
{
  __host__ __device__
  ?? ?? (const array ai, const array *b) {
      for bi in b: // each array in the 2d array
      {
          c.push_back(bi[0] + ai[0]); // write down the array count

          for i in ai: // each element in the ai array
             c.push_back(i);

          for i in bi: // each element in the bi array
             c.push_back(i);
      }
  }
};

任何人任何想法?

Anyone any idea?

推荐答案

我猜你不会得到在这类操作的任何GPU的速度增加,因为它需要大量的面向对象存储访问 - 一个缓慢运行在GPU上。

I guess you won't get any speed increase on the GPU in such kind of operation since it needs a lot oo memory accesses - a slow operation on GPU.

但是,如果你想无论如何要实现这一点:

But if you anyway want to implement this:


  1. 我想,对于我之所以写previously,信任不会帮你准备使用的算法。这意味着你需要编写自己的内核,但是,可以将存储管理thust。

  1. I guess, for the reason I wrote previously, trust won't help you with ready-to-use algorithm. This means that you need to write your own kernel, however, you can leave memory management to thust.

它总是更快地创建CPU和内存阵列,在准备好时,整个阵列复制到GPU。 (CPU&LT; - > GPU拷贝更快的数据长continiuos件)

It is always faster to create arrays in CPU memory and, when ready, copy the whole array to GPU. (CPU<->GPU copies are faster on long continiuos pieces of data)

请记住,GPU运行数百线程并行。每一个线程需要知道读什么和在哪里写的。

Keep in mind that GPU runs hundreds of threads in parallel. Each thread need to know what to read and where to write.

全球内存操作很慢(300-400时钟)。避免线程读取从全局内存全阵列式找出它需要只有最后几个字节。

Global memory operations are slow (300-400 clocks). Avoid thread reading the whole array from global memory to find out that it needed only the last few bytes.

所以,我可以看到你的程序。

So, as I can see you program.


  1. 请在CPU存储器阵列的一维是这样的:

  1. Make your arrays 1D in a CPU memory look like this:

浮ARRAY1 [] = {1,2,3,4};
浮数组2 [] = {5,6,7};
INT arr1offsets [] = {0,2,3,1}; //第一元件和子阵列对的长度的位置
诠释arr2offsets [] = {0,1,1,2};

float array1[] = { 1, 2, 3, 4}; float array2[] = { 5, 6, 7}; int arr1offsets[] = {0, 2, 3, 1}; // position of the first element and length of subarray pairs int arr2offsets[] = {0, 1, 1, 2};

您的阵列和偏移复制到GPU和结果分配内存,它的偏移量。我猜,你要算一个关节子数组的最大长度并分配最坏的情况下内存。

Copy your arrays and offsets to GPU and allocate memory for result and it's offsets. I guess, you'll have to count max length of one joint subarray and allocate memory for the worst case.

运行内核。

收集结果

内核可能是这样的(如果我理解正确的话你的想法)

The kernel may look like this (If I correctly understood your idea)

__global__ void kernel(float* arr1, int* arr1offset, 
                       float* arr2, int* arr2offset, 
                       float* result, int* resultoffset)
{
  int idx = threadIdx.x+ blockDim.x*blockIdx.x;
  int a1beg = arr1offset[Idx*2];
  int a2beg = arr2offset[Idx*2];
  int a1len = arr1offset[Idx*2+1];
  int a2len = arr2offset[Idx*2+1];
  resultoffset[idx*2] = idx*MAX_SUBARRAY_LEN;
  resultoffset[idx*2+1] = a1len+a2len;

  for (int k = 0; k < a1len; ++k) result[idx*MAX_SUBARRAY_LEN+k] = arr1[a1beg+k];
  for (int k = 0; k < a2len; ++k) result[idx*MAX_SUBARRAY_LEN+a1len+k] = arr2[a2beg+k];
}

这code是不完美的,但应该做正确的事情。

This code is not perfect, but should do the right thing.

这篇关于推力 - 如何使用我的数组/数据 - 模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆