在CUDA中查找最大值 [英] Finding max value in CUDA
问题描述
我想在CUDA中编写一个代码,用于为给定的数字集合查找最大值
。
I am trying to write a code in CUDA for finding the max value for the given set of numbers.
假设您有20个数字,内核正在2个5个线程块上运行。现在假设10个线程同时比较前10个值,并且线程2 找到最大值,因此线程2更新全局内存中的最大值变量。当线程2正在更新时,将使用旧值比较的剩余线程(1,3-10)会发生什么?
Assume you have 20 numbers, and the kernel is running on 2 blocks of 5 threads. Now assume the 10 threads compare the first 10 values at the same time, and thread 2 finds a max value, so thread 2 is updating the max value variable in global memory. While thread 2 is updating, what will happen to the remaining threads (1,3-10) that will be comparing using the old value?
如果我锁定全局变量使用atomicCAS(),线程(1,3-10)将使用旧的最大值进行比较?如何解决这个问题?
If I lock the global variable using atomicCAS(), will the threads (1,3-10) compare using the old max value? How can I overcome this problem?
推荐答案
这是一个纯粹的缩减问题。这是一个很好的演示文稿由NVIDIA优化缩减GPU。您可以使用相同的技术来查找所有元素的最小值,最大值或总和。
This is a purely a reduction problem. Here's a good presentation by NVIDIA for optimizing reduction on GPUs. You can use the same technique to either find the minimum, maximum or sum of all elements.
这篇关于在CUDA中查找最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!