如何在CUDA上对数组进行部分排序? [英] How to partly sort arrays on CUDA?
问题描述
假设我有两个数组:
const int N = 1000000;
float A[N];
myStruct *B[N];
A中的数字可以是正数或负数(例如 A [N] = {3,2,-1,0,5,-2}
),如何使数组A 部分排序(首先是正值,不需要排序,负值)(例如 A [N] = {3,2,5,0,-1,-2}
或 A [N] = {5,2,3,0,-2,-1}
)?数组B应根据A(A是键,B是值)改变。
The numbers in A can be positive or negative (e.g. A[N]={3,2,-1,0,5,-2}
), how can I make the array A partly sorted (all positive values first, not need to be sorted, then negative values)(e.g. A[N]={3,2,5,0,-1,-2}
or A[N]={5,2,3,0,-2,-1}
) on the GPU? The array B should be changed according to A (A is keys, B is values).
由于A,B的规模可能非常大,我认为排序算法应该在GPU上实现(,尤其是CUDA,因为我使用这个平台)。当然我知道 thrust :: sort_by_key
可以做这项工作,但它可以做额外的工作,因为我不需要数组A& B完全排序。
Since the scale of A,B can be very large, I think the sort algorithm should be implemented on GPU (especially on CUDA, because I use this platform). Surely I know thrust::sort_by_key
can do this work, but it does muck extra work since I do not need the array A&B to be sorted entirely.
有人遇到这种问题吗?
thrust::sort_by_key(thrust::device_ptr<float> (A),
thrust::device_ptr<float> ( A + N ),
thrust::device_ptr<myStruct> ( B ),
thrust::greater<float>() );
推荐答案
Thrust的Github文档不是最新的。正如@JaredHoberock所说, thrust :: partition
是现在的方式支持模具。您可能需要从 Github存储库获取副本:
Thrust's documentation on Github is not up-to-date. As @JaredHoberock said, thrust::partition
is the way to go since it now supports stencils. You may need to get a copy from the Github repository:
git clone git://github.com/thrust/thrust.git
git clone git://github.com/thrust/thrust.git
然后在Thrust文件夹中运行 scons doc
以获取更新的文档,并在编译代码时使用这些更新的Thrust源( nvcc -I / path /到/ thrust ...
)。使用新的模板分区,您可以:
Then run scons doc
in the Thrust folder to get an updated documentation, and use these updated Thrust sources when compiling your code (nvcc -I/path/to/thrust ...
). With the new stencil partition, you can do:
#include <thrust/partition.h>
#include <thrust/execution_policy.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>
struct is_positive
{
__host__ __device__
bool operator()(const int &x)
{
return x >= 0;
}
};
thrust::partition(thrust::host, // if you want to test on the host
thrust::make_zip_iterator(thrust::make_tuple(keyVec.begin(), valVec.begin())),
thrust::make_zip_iterator(thrust::make_tuple(keyVec.end(), valVec.end())),
keyVec.begin(),
is_positive());
这会返回:
Before:
keyVec = 0 -1 2 -3 4 -5 6 -7 8 -9
valVec = 0 1 2 3 4 5 6 7 8 9
After:
keyVec = 0 2 4 6 8 -5 -3 -7 -1 -9
valVec = 0 2 4 6 8 5 3 7 1 9
请注意,2个分区不一定排序。此外,原始矢量和分区之间的顺序可以不同。如果这对您很重要,您可以使用 thrust :: stable_partition
:
Note that the 2 partitions are not necessarily sorted. Also, the order may differ between the original vectors and the partitions. If this is important to you, you can use thrust::stable_partition
:
stable_partition不同于分区,因为stable_partition是保证相对顺序的
。也就是说,如果x和y是[first,last)中的
元素,使得pred(x)== pred(y),并且如果x
在y之前, stable_partition x
在y之前。
stable_partition differs from partition in that stable_partition is guaranteed to preserve relative order. That is, if x and y are elements in [first, last), such that pred(x) == pred(y), and if x precedes y, then it will still be true after stable_partition that x precedes y.
如果你想要一个完整的例子,这里是:
If you want a complete example, here it is:
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/partition.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>
struct is_positive
{
__host__ __device__
bool operator()(const int &x)
{
return x >= 0;
}
};
void print_vec(const thrust::host_vector<int>& v)
{
for(size_t i = 0; i < v.size(); i++)
std::cout << " " << v[i];
std::cout << "\n";
}
int main ()
{
const int N = 10;
thrust::host_vector<int> keyVec(N);
thrust::host_vector<int> valVec(N);
int sign = 1;
for(int i = 0; i < N; ++i)
{
keyVec[i] = sign * i;
valVec[i] = i;
sign *= -1;
}
// Copy host to device
thrust::device_vector<int> d_keyVec = keyVec;
thrust::device_vector<int> d_valVec = valVec;
std::cout << "Before:\n keyVec = ";
print_vec(keyVec);
std::cout << " valVec = ";
print_vec(valVec);
// Partition key-val on device
thrust::partition(thrust::make_zip_iterator(thrust::make_tuple(d_keyVec.begin(), d_valVec.begin())),
thrust::make_zip_iterator(thrust::make_tuple(d_keyVec.end(), d_valVec.end())),
d_keyVec.begin(),
is_positive());
// Copy result back to host
keyVec = d_keyVec;
valVec = d_valVec;
std::cout << "After:\n keyVec = ";
print_vec(keyVec);
std::cout << " valVec = ";
print_vec(valVec);
}
UPDATE
我与 thrust :: sort_by_key
版本做了一个快速比较,并且 thrust :: partition
实现似乎是更快(这是我们可以自然预期)。这是我在NVIDIA Visual Profiler上获得的 N = 1024 * 1024
,其中排序版本在左右侧的分区版本。您可以自己做同样的测试。
UPDATE
I made a quick comparison with the thrust::sort_by_key
version, and the thrust::partition
implementation does seem to be faster (which is what we could naturally expect). Here is what I obtain on NVIDIA Visual Profiler, with N = 1024 * 1024
, with the sort version on the left, and the partition version on the right. You may want to do the same kind of tests on your own.
这篇关于如何在CUDA上对数组进行部分排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!