如何在CUDA上对数组进行部分排序？ [英] How to partly sort arrays on CUDA?

查看：848 发布时间：2017/3/4 15:40:05 sorting cuda thrust

本文介绍了如何在CUDA上对数组进行部分排序？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有两个数组：

   const int N = 1000000;
   float A[N];
   myStruct *B[N];

A中的数字可以是正数或负数（例如 A [N] = {3,2，-1,0,5，-2} ），如何使数组A 部分排序（首先是正值，不需要排序，负值）（例如 A [N] = {3,2,5,0，-1，-2} 或 A [N] = {5,2,3,0，-2，-1} ）？数组B应根据A（A是键，B是值）改变。

The numbers in A can be positive or negative (e.g. A[N]={3,2,-1,0,5,-2}), how can I make the array A partly sorted (all positive values first, not need to be sorted, then negative values)(e.g. A[N]={3,2,5,0,-1,-2} or A[N]={5,2,3,0,-2,-1}) on the GPU? The array B should be changed according to A (A is keys, B is values).

由于A，B的规模可能非常大，我认为排序算法应该在GPU上实现（，尤其是CUDA，因为我使用这个平台）。当然我知道 thrust :: sort_by_key 可以做这项工作，但它可以做额外的工作，因为我不需要数组A& B完全排序。

Since the scale of A,B can be very large, I think the sort algorithm should be implemented on GPU (especially on CUDA, because I use this platform). Surely I know thrust::sort_by_key can do this work, but it does muck extra work since I do not need the array A&B to be sorted entirely.

有人遇到这种问题吗？

thrust::sort_by_key(thrust::device_ptr<float> (A), 
            thrust::device_ptr<float> ( A + N ),  
            thrust::device_ptr<myStruct> ( B ),  
            thrust::greater<float>() );

推荐答案

Thrust的Github文档不是最新的。正如@JaredHoberock所说， thrust :: partition 是现在的方式支持模具。您可能需要从 Github存储库获取副本：

Thrust's documentation on Github is not up-to-date. As @JaredHoberock said, thrust::partition is the way to go since it now supports stencils. You may need to get a copy from the Github repository:

git clone git：//github.com/thrust/thrust.git

git clone git://github.com/thrust/thrust.git

然后在Thrust文件夹中运行 scons doc 以获取更新的文档，并在编译代码时使用这些更新的Thrust源（ nvcc -I / path /到/ thrust ... ）。使用新的模板分区，您可以：

Then run scons doc in the Thrust folder to get an updated documentation, and use these updated Thrust sources when compiling your code (nvcc -I/path/to/thrust ...). With the new stencil partition, you can do:

#include <thrust/partition.h>
#include <thrust/execution_policy.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>

struct is_positive
{
__host__ __device__
bool operator()(const int &x)
{
  return x >= 0;
}
};


thrust::partition(thrust::host, // if you want to test on the host
                  thrust::make_zip_iterator(thrust::make_tuple(keyVec.begin(), valVec.begin())),
                  thrust::make_zip_iterator(thrust::make_tuple(keyVec.end(), valVec.end())),
                  keyVec.begin(),
                  is_positive());

这会返回：

Before:
  keyVec =   0  -1   2  -3   4  -5   6  -7   8  -9
  valVec =   0   1   2   3   4   5   6   7   8   9
After:
  keyVec =   0   2   4   6   8  -5  -3  -7  -1  -9
  valVec =   0   2   4   6   8   5   3   7   1   9

请注意，2个分区不一定排序。此外，原始矢量和分区之间的顺序可以不同。如果这对您很重要，您可以使用 thrust :: stable_partition ：

Note that the 2 partitions are not necessarily sorted. Also, the order may differ between the original vectors and the partitions. If this is important to you, you can use thrust::stable_partition:

stable_partition不同于分区，因为stable_partition是保证相对顺序的
。也就是说，如果x和y是[first，last）中的
元素，使得pred（x）== pred（y），并且如果x
在y之前， stable_partition x
在y之前。

stable_partition differs from partition in that stable_partition is guaranteed to preserve relative order. That is, if x and y are elements in [first, last), such that pred(x) == pred(y), and if x precedes y, then it will still be true after stable_partition that x precedes y.

如果你想要一个完整的例子，这里是：

If you want a complete example, here it is:

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/partition.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>

struct is_positive
{
__host__ __device__
bool operator()(const int &x)
{
  return x >= 0;
}
};

void print_vec(const thrust::host_vector<int>& v)
{
  for(size_t i = 0; i < v.size(); i++)
    std::cout << "  " << v[i];
  std::cout << "\n";
}

int main ()
{
  const int N = 10;

  thrust::host_vector<int> keyVec(N);
  thrust::host_vector<int> valVec(N);

  int sign = 1;
  for(int i = 0; i < N; ++i)
  {
    keyVec[i] = sign * i;
    valVec[i] = i;
    sign *= -1;
  }

  // Copy host to device
  thrust::device_vector<int> d_keyVec = keyVec;
  thrust::device_vector<int> d_valVec = valVec;

  std::cout << "Before:\n  keyVec = ";
  print_vec(keyVec);
  std::cout << "  valVec = ";
  print_vec(valVec);

  // Partition key-val on device
  thrust::partition(thrust::make_zip_iterator(thrust::make_tuple(d_keyVec.begin(), d_valVec.begin())),
                    thrust::make_zip_iterator(thrust::make_tuple(d_keyVec.end(), d_valVec.end())),
                    d_keyVec.begin(),
                    is_positive());

  // Copy result back to host
  keyVec = d_keyVec;
  valVec = d_valVec;

  std::cout << "After:\n  keyVec = ";
  print_vec(keyVec);
  std::cout << "  valVec = ";
  print_vec(valVec);
}

UPDATE

我与 thrust :: sort_by_key 版本做了一个快速比较，并且 thrust :: partition 实现似乎是更快（这是我们可以自然预期）。这是我在NVIDIA Visual Profiler上获得的 N = 1024 * 1024 ，其中排序版本在左右侧的分区版本。您可以自己做同样的测试。

UPDATE

I made a quick comparison with the thrust::sort_by_key version, and the thrust::partition implementation does seem to be faster (which is what we could naturally expect). Here is what I obtain on NVIDIA Visual Profiler, with N = 1024 * 1024, with the sort version on the left, and the partition version on the right. You may want to do the same kind of tests on your own.

这篇关于如何在CUDA上对数组进行部分排序？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在CUDA上对数组进行部分排序？ [英] How to partly sort arrays on CUDA?

问题描述

推荐答案

UPDATE

UPDATE

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

如何在CUDA上对数组进行部分排序？ [英] How to partly sort arrays on CUDA?

问题描述

推荐答案

UPDATE

UPDATE

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭