CUDA:如何直接在GPU上使用thrust :: sort_by_key? [英] CUDA: how to use thrust::sort_by_key directly on the GPU?

查看:3588
本文介绍了CUDA:如何直接在GPU上使用thrust :: sort_by_key?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Thrust库可用于对数据进行排序。调用可能看起来像这样(使用键和值向量):

The Thrust library can be used to sort data. The call might look like this (with a keys and a values vector):

thrust::sort_by_key(d_keys.begin(), d_keys.end(), d_values.begin());

在CPU上调用, d_keys d_values 在CPU内存中;并且大部分的执行发生在GPU上。

called on the CPU, with d_keys and d_values being in the CPU memory; and the bulk of the execution happens on the GPU.

但是,我的数据已经在GPU上了?如何使用Thrust库直接在GPU上执行高效排序,即从内核调用 sort_by_key 函数?

However, my data is already on the GPU? How can I use the Thrust library to perform efficient sorting directly on the GPU, i.e., to call the sort_by_key function from a kernel?

此外,我的数据包括
unsigned long long int unsigned int 和总是 unsigned int 的数据。

Also, my data consists of keys that are either unsigned long long int or unsigned int and data that is always unsigned int. How should I make the thrust call for these types?

推荐答案

如Talonmies链接的问题所述,您不能从CUDA函数(例如 __ device __ __ global __ )。但是,这并不意味着您不能使用Thrust已经在设备内存中使用的数据。相反,您可以使用包装原始数据的Thrust向量从主机调用所需的Thrust函数。例如

As stated in the question Talonmies linked, you cannot call Thrust from a CUDA function (e.g. __device__ or __global__). However, this doesn't mean you can't use data you already have in device memory with Thrust. Rather, you call the desired Thrust functions from the host using Thrust vectors wrapping your raw data. e.g.

//raw pointer to device memory
unsigned int * raw_data;
unsigned int * raw_keys;
//allocate device memory for data and keys
cudaMalloc((void **) &raw_data, N_data * sizeof(int));
cudaMalloc((void **) &raw_keys, N_keys * sizeof(int));

//populate your device pointers in your kernel 
kernel<<<...>>>(raw_data, raw_keys, ...);

...

//wrap raw pointer with a device_ptr to use with Thrust functions
thrust::device_ptr<unsigned int> dev_data_ptr(raw_data);
thrust::device_ptr<unsigned int> dev_keys_ptr(raw_keys);

//use the device memory with a thrust call
thrust::sort_by_key(d_keys, d_keys + N_keys, dev_data_ptr);

raw_data 指向的设备内存 raw_keys 在使用 Thrust :: device_ptr 包装时仍然在设备内存中,因此在调用Thrust功能从主机,它不必将任何内存从主机复制到设备,反之亦然。也就是说,您使用设备内存直接在GPU上进行排序;你唯一的开销是启动Thrust内核和包装原始设备指针。

The device memory pointed to by raw_data and raw_keys are still in device memory when you wrap them with Thrust::device_ptr, so while you're calling the Thrust function from the host, it doesn't have to copy any memory from host to device or vice versa. That is, you're sorting directly on the GPU, using device memory; the only overhead you would have is in launching the Thrust kernel(s) and wrapping the raw device pointers.

当然,如果你需要在常规CUDA内核中使用它们,你可以得到你的原始指针:

And of course, you can get your raw pointers back if you need to use them in a regular CUDA kernel afterward:

unsigned int * raw_ptr = thrust::raw_pointer_cast(dev_data_ptr);

至于使用 unsigned long long int unsigned int 作为您的键的数据 unsigned int ,这不是一个问题,因为Thrust是模板。也就是说, sort_by_key 的签名是

As for using either unsigned long long int or unsigned int as your keys with data that's unsigned int, this isn't a problem, as Thrust is templated. That is, the signature for sort_by_key is

template<typename RandomAccessIterator1 , typename RandomAccessIterator2 >
void thrust::sort_by_key(           
    RandomAccessIterator1   keys_first,
    RandomAccessIterator1   keys_last,
    RandomAccessIterator2   values_first )

意味着您可以为键和数据使用不同的类型。只要所有的键类型是同质的给定的调用,Thrust应该能够自动推断类型,你不必做任何特别的。希望这有意义

meaning that you can have different types for the keys and data. As long as all of your key-types are homogenous for a given call, Thrust should be able to infer the types automatically and you won't have to do anything special. Hopefully that makes sense

这篇关于CUDA:如何直接在GPU上使用thrust :: sort_by_key?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆