CUB（CUDA UnBound）等效于thrust :: gather [英] CUB (CUDA UnBound) equivalent of thrust::gather

查看：1275 发布时间：2017/3/4 14:50:14 c++ cuda parallel-processing gpu thrust

本文介绍了CUB（CUDA UnBound）等效于thrust :: gather的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

由于Thrust库存在一些性能问题（请参见这个页面的更多细节），我计划重新考虑一个CUDA应用程序使用CUB而不是Thrust。具体来说，替换thrust :: sort_by_key和thrust :: inclusive_scan调用）。在我的应用程序中的一个特定的点，我需要通过键排序3数组。这是我这样做与推力：

Due to some performance issues with the Thrust libraries (see this page for more details), I am planning on re-factoring a CUDA application to use CUB instead of Thrust. Specifically, to replace the thrust::sort_by_key and thrust::inclusive_scan calls). In a particular point in my application I need to sort 3 arrays by key. This is how I did this with thrust:

thrust::sort_by_key(key_iter, key_iter + numKeys, indices);
thrust::gather_wrapper(indices, indices + numKeys, 
      thrust::make_zip_iterator(thrust::make_tuple(values1Ptr, values2Ptr, values3Ptr)),
      thrust::make_zip_iterator(thrust::make_tuple(valuesOut1Ptr, valuesOut2Ptr, valuesOut3Ptr))
);

其中

key iter 是一个thrust :: device_ptr，指向我要按

 
  值{1,2,3} Ptr 
 是我想要排序的值的device_ptrs  
  值{1,2,3} OutPtr 到排序后的值



key iter is a thrust::device_ptr that points to the keys i want to sort by
indices point to a sequence (from 0 to numKeys-1) in device memory
values{1,2,3}Ptr are device_ptrs to the values i want to sort
values{1,2,3}OutPtr are device_ptrs to the sorted values

使用 CUB SortPairs 函数我可以排序单个值缓冲区，但不是所有的3在一个镜头。问题是我没有看到任何CUB收集像实用程序。建议？
With the CUB SortPairs function I can sort a single value buffer, but not all 3 in one shot.  Problem is I don't see any CUB "gather-like" utilities.  Suggestions?
 编辑： 
我想我可以实现自己的collect内核，但是除此之外还有更好的方法：
I suppose I could implement my own gather kernel, but is there any better way to do this other than:
template <typename Index, typename Value> 
__global__ void  gather_kernel(const unsigned int N, const Index * map, 
const Value * src, Value * dst) 
{ 
    unsigned int i = blockDim.x * blockIdx.x + threadIdx.x; 
    if (i < N) 
    { 
        dst[i] = src[map[i]]; 
    } 
} 

非合并加载和商店让我chringe，但是在 map 上没有已知结构可能不可避免。
The non-coalesed loads and stores make me chringe, but it probably unavoidable without a known structure on map.
推荐答案
看起来你想要实现取决于 thrust :: zip_iterator 。您可以
It seems what you want to achieve depends on thrust::zip_iterator. You could either 
 
 只能用 thrust :: sort_by_key  > cub :: DeviceRadixSort :: SortPairs 并保留 thrust :: gather 或
 
  zip 值{1,2,3} 转换为结构数组，然后使用 cub :: DeviceRadixSort :: SortPairs  
 
 

only replace thrust::sort_by_key by cub::DeviceRadixSort::SortPairs and keep thrust::gather, or
zip values{1,2,3} into array of structures before using cub::DeviceRadixSort::SortPairs

 
 
 
更新
 
 
 读取 / code>，



update

After reading the implementation of thrust::gather,
$CUDA_HOME/include/thrust/system/detail/generic/gather.inl

您可以看到它只是一个天真的内核，如
you can see it is only a naive kernel like
__global__ gather(int* index, float* in, float* out, int len) {
  int i=...;
  if (i<len) { out[i] = in[index[i]]; }
}

然后我认为你的代码可以被单个内核替换
Then I think your code above can be replaced by a single kernel without too much effort.
在这个内核中，你可以首先使用CUB block-wize原语  cub :: BlockRadixSort< ...> :: SortBlockedToStriped  以获取存储在寄存器中的排序索引，然后执行一个初始重新排序复制 thrust :: gather 以填充值{1,2,3}输出 
In this kernel, you could first use the CUB block-wize primitive cub::BlockRadixSort<...>::SortBlockedToStriped to get the sorted indices stored in registers and then perform a naive re-order copy as thrust::gather to fill values{1,2,3}Out.
使用 SortBlockedToStriped 而不是值时，classcub_1_1_block_radix_sort.html＃aac3d9424388b0cb6da360624e627b61erel =nofollow> 排序 可以执行合并写入。

                        这篇关于CUB（CUDA UnBound）等效于thrust :: gather的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

CUB（CUDA UnBound）等效于thrust :: gather [英] CUB (CUDA UnBound) equivalent of thrust::gather

问题描述

推荐答案

更新

update

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

CUB（CUDA UnBound）等效于thrust :: gather [英] CUB (CUDA UnBound) equivalent of thrust::gather

问题描述

推荐答案

更新

update

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭