是否有更好的方法来处理“按block_size的不可计数的数字"?在CUDA? [英] Is there a better way to process "undividable count of numbers by block_size" in CUDA?

查看:75
本文介绍了是否有更好的方法来处理“按block_size的不可计数的数字"?在CUDA?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要对N个数字的向量进行数据约简(查找k-最大数).问题是我事先不知道N(在编译之前),并且在构造两个内核时,我不确定我是否做对了-一个带有(int)(N / block_size)块的内核,第二个带有一个块的内核. N % block_size线程.

I need to do data reduction (find k-max number) on vector of N numbers. The problem is I don't know the N beforehand (before compilation), and I am not sure if I'm doing it right when I'm constructing two kernels - one with (int)(N / block_size) blocks and the second kernel with one block of N % block_size threads.

是否有更好的方法来处理CUDA中按block_size进行的不可区分的"数字计数?

推荐答案

@RobertCrovella的答案描述了处理这种情况的标准方法,通常无需担心内核中需要额外的if条件.

@RobertCrovella's answer describes the standard way of handling the situation and there is typically no need to worry about the extra if conditional that is needed in the kernel.

但是,另一种替代方法是分配输入和输出缓冲区,其填充量最大为可被块大小整除的数字,运行内核(不使用if),然后忽略多余的结果,例如通过不将它们复制回CPU.

However, another alternative is to allocate the input and output buffers with padding up to a number that is divisible by the block size, run the kernel (without the if) and then ignore the extra results, for instance by not copying them back to the CPU.

这篇关于是否有更好的方法来处理“按block_size的不可计数的数字"?在CUDA?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆