在CUDA中排序许多小数组 [英] Sorting many small arrays in CUDA
问题描述
我在CUDA中实现了一个中值滤波器。对于特定像素,我提取对应于像素周围窗口的邻居,例如 N x N
( 3 x 3
)窗口,现在有一个 N x N
元素的数组。我不想为我的应用程序使用超过 10 x 10
元素的窗口。
I am implementing a median filter in CUDA. For a particular pixel, I extract its neighbors corresponding to a window around the pixel, say a N x N
(3 x 3
) window, and now have an array of N x N
elements. I do not envision using a window of more than 10 x 10
elements for my application.
此数组现在在本地存在于内核中,并已加载到设备内存中。从我以前阅读的SO帖子,最常见的排序算法由Thrust实现。但是,Thrust只能从主机调用。主题 - 推荐用户编写的内核
This array is now locally present in the kernel and already loaded into device memory. From previous SO posts that I have read, the most common sorting algorithms are implemented by Thrust. But, Thrust can only be called from the host. Thread - Thrust inside user written kernels
是否有一种快速有效的方法来排序内核中 N x N
元素的小数组?
Is there a quick and efficient way to sort a small array of N x N
elements inside the kernel?
推荐答案
如果元素的数量是固定的和小的,你可以使用排序网络( http://pages.ripco.net/~jgamble/nw.html )。它为固定数量的元素提供固定数量的比较/交换操作(例如,8个元素的19个比较/交换迭代)。
If the number of elements is fixed and small, you can use sorting networks (http://pages.ripco.net/~jgamble/nw.html). It provides a fixed number of compare/swap operations for a fixed number of elements (eg. 19 compare/swap iterations for 8 elements).
这篇关于在CUDA中排序许多小数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!