CUDA Thrust:reduce_by_key 仅基于数组中的某些值，基于“键"中的值；大批 [英] CUDA Thrust: reduce_by_key on only some values in an array, based off values in a "key" array

查看：13 发布时间：2022/1/10 15:38:33 cuda gpu thrust reduction

本文介绍了CUDA Thrust:reduce_by_key 仅基于数组中的某些值，基于“键"中的值；大批的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有两个 device_vector<byte>数组，d_keys 和 d_data.

Let's say I have two device_vector<byte> arrays, d_keys and d_data.

如果 d_data 例如是一个扁平的 2D 3x5 数组(例如 { 1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 7, 6, 5, 4, 3 } )和 d_keys 是大小为 5 的一维数组(例如 { 1, 0, 0, 1, 1 } )，我怎样才能做一个减少这样我最终如果相应的 d_keys 值为 1(例如以 { 10, 23, 14 } 的结果结束)，则仅在每行基础上添加值?

If d_data is, for example, a flattened 2D 3x5 array ( e.g. { 1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 7, 6, 5, 4, 3 } ) and d_keys is a 1D array of size 5 ( e.g. { 1, 0, 0, 1, 1 } ), how can I do a reduction such that I'd end up only adding values on a per-row basis if the corresponding d_keys value is one ( e.g. ending up with a result of { 10, 23, 14 } )?

sum_rows.cu 示例允许我在 d_data 中添加每个值，但这并不完全正确.

The sum_rows.cu example allows me to add every value in d_data, but that's not quite right.

或者，我可以逐行使用 zip_iterator 并一次将 d_keys 与一行 d_data 组合，并执行 transform_reduce，仅当键值为 1 时才添加，但随后我必须遍历 d_data 数组.

Alternatively, I can, on a per-row basis, use a zip_iterator and combine d_keys with one row of d_data at a time, and do a transform_reduce, adding only if the key value is one, but then I'd have to loop through the d_data array.

我真正需要的是某种不是内置的 transform_reduce_by_key 功能，但肯定有办法实现它！

What I really need is some sort of transform_reduce_by_key functionality that isn't built-in, but surely there must be a way to make it!

推荐答案

基于额外的注释，而不是 3 行有数千行，我们可以编写一个对整行求和的转换函子.基于有数千行的事实，这应该会让机器非常忙碌:

Based on the additional comment that instead of 3 rows there are thousands of rows, we can write a transform functor that sums an entire row. Based on the fact that there are thousands of rows, this should keep the machine pretty busy:

#include <iostream>
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/transform.h>
#include <thrust/sequence.h>
#include <thrust/fill.h>

#define ROW   20
#define COL   10

__device__ int *vals;
__device__ int *keys;

struct test_functor
{
  const int a;

  test_functor(int _a) : a(_a) {}

  __device__
  int operator()(int& x, int& y ) {
    int temp = 0;
    for (int i = 0; i<a; i++)
      temp += vals[i + (y*a)] * keys[i];
    return temp;
    }
};

int main(){
  int *s_vals, *s_keys;
  thrust::host_vector<int> h_vals(ROW*COL);
  thrust::host_vector<int> h_keys(COL);
  thrust::sequence(h_vals.begin(), h_vals.end());
  thrust::fill(h_keys.begin(), h_keys.end(), 1);
  h_keys[0] = 0;
  thrust::device_vector<int> d_vals = h_vals;
  thrust::device_vector<int> d_keys = h_keys;
  thrust::device_vector<int> d_sums(ROW);
  thrust::fill(d_sums.begin(), d_sums.end(), 0);
  s_vals = thrust::raw_pointer_cast(&d_vals[0]);
  s_keys = thrust::raw_pointer_cast(&d_keys[0]);
  cudaMemcpyToSymbol(vals, &s_vals, sizeof(int *));
  cudaMemcpyToSymbol(keys, &s_keys, sizeof(int *));
  thrust::device_vector<int> d_idx(ROW);
  thrust::sequence(d_idx.begin(), d_idx.end());
  thrust::transform(d_sums.begin(), d_sums.end(), d_idx.begin(),  d_sums.begin(), test_functor(COL));
  thrust::host_vector<int> h_sums = d_sums;
  std::cout << "Results :" << std::endl;
  for (unsigned i = 0; i<ROW; i++)
    std::cout<<"h_sums["<<i<<"] = " << h_sums[i] << std::endl;
  return 0;
}

这种方法的缺点是通常不会合并对 vals 数组的访问.然而，对于几千行，缓存可能会提供显着的缓解.我们可以通过重新排序以列优先格式存储在展平数组中的数据来解决此问题，并将函子中循环中的索引方法更改为如下所示:

This approach has the drawback that in general accesses to the vals array will not be coalesced. However for a few thousand rows the cache may offer significant relief. We can fix this problem by re-ordering the data to be stored in column-major form in the flattened array, and change our indexing method in the loop in the functor to be like this:

for (int i=0; i<a; i++)
  temp += vals[(i*ROW)+y]*keys[i];

如果愿意，您可以将 ROW 作为附加参数传递给函子.

If preferred, you can pass ROW as an additional parameter to the functor.

这篇关于CUDA Thrust:reduce_by_key 仅基于数组中的某些值，基于“键"中的值；大批的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

CUDA Thrust:reduce_by_key 仅基于数组中的某些值，基于“键"中的值；大批 [英] CUDA Thrust: reduce_by_key on only some values in an array, based off values in a "key" array

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

CUDA Thrust:reduce_by_key 仅基于数组中的某些值，基于“键"中的值；大批 [英] CUDA Thrust: reduce_by_key on only some values in an array, based off values in a &quot;key&quot; array

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

CUDA Thrust:reduce_by_key 仅基于数组中的某些值，基于“键"中的值；大批 [英] CUDA Thrust: reduce_by_key on only some values in an array, based off values in a "key" array

登录关闭