计算CUDA数组中数字的出现 [英] Counting occurrences of numbers in a CUDA array
问题描述
我使用CUDA(通常 1000000
个元素)在GPU上存储了一个无符号整数数组。我想计算数组中每个数字的出现。只有几个不同的数字(大约 10
),但是这些数字的范围可以从1到 1000000
。大约 9/10
的数字是 0
,我不需要计数。结果看起来像这样:
I have an array of unsigned integers stored on the GPU with CUDA (typically 1000000
elements). I would like to count the occurrence of every number in the array. There are only a few distinct numbers (about 10
), but these numbers can span from 1 to 1000000
. About 9/10
th of the numbers are 0
, I don't need the count of them. The result looks something like this:
58458 -> 1000 occurrences
15 -> 412 occurrences
我有一个使用 atomicAdd
的实现s,但是它太慢了(很多线程写入相同的地址)。有人知道快速/高效的方法吗?
I have an implementation using atomicAdd
s, but it is too slow (a lot of threads write to the same address). Does someone know of a fast/efficient method?
推荐答案
您可以先对数字进行排序,然后再进行
You can implement a histogram by first sorting the numbers, and then doing a keyed reduction.
最直接的方法是使用 thrust :: sort
然后使用 thrust :: reduce_by_key
。它通常也比基于原子的临时装箱要快得多。这是一个示例。
The most straightforward method would be to use thrust::sort
and then thrust::reduce_by_key
. It's also often much faster than ad hoc binning based on atomics. Here's an example.
这篇关于计算CUDA数组中数字的出现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!