CONFLICT_FREE_OFFSET宏用于GPU Gems 3的并行前缀算法 [英] CONFLICT_FREE_OFFSET macro used in the parallel prefix algorithm from GPU Gems 3

查看：282 发布时间：2017/3/4 15:39:26 cuda prefix-sum

本文介绍了CONFLICT_FREE_OFFSET宏用于GPU Gems 3的并行前缀算法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

首先，这里是指向算法的链接 http：//http.developer .nvidia.com / GPUGems3 / gpugems3_ch39.html

First of all, here is the link to the algorithm http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html

为了避免银行冲突，每隔NUM_BANKS将填充添加到共享内存数组（即32对于可计算性2.x）元素。这是通过（如图39-5）：

In order to avoid bank conflicts, padding is added to the shared memory array every NUM_BANKS (i.e., 32 for devices of computability 2.x) elements. This is done by (as in Figure 39-5):

int ai = offset*(2*thid+1)-1
int bi = offset*(2*thid+2)-1
ai += ai/NUM_BANKS
bi += ai/NUM_BANKS
temp[bi] += temp[ai]

我不明白ai / NUM_BANKS如何等同于宏：

I don't understand how ai/NUM_BANKS is equivalent to the macro:

   #define NUM_BANKS 16  
   #define LOG_NUM_BANKS 4  
   #define CONFLICT_FREE_OFFSET(n) \  
          ((n) >> NUM_BANKS + (n) >> (2 * LOG_NUM_BANKS))

不等于

n >> LOG_NUM_BANKS

任何帮助。感谢

推荐答案

我写了这段代码并合写了这篇文章，我要求你使用这篇文章来学习扫描算法，并且不要使用它中的代码。它是在CUDA是新的时候写的，我是CUDA的新手。如果您在CUDA中使用scan的现代实现，则不需要任何银行冲突避免。

I wrote that code and co-wrote the article, and I request that you use the article only for learning about scan algorithms, and do not use the code in it. It was written when CUDA was new, and I was new to CUDA. If you use a modern implementation of scan in CUDA you don't need any bank conflict avoidance.

如果要以简单的方式进行扫描，请使用 thrust :: inclusive_scan 或 thrust :: exclusive_scan 。

If you want to do scans the easy way, use thrust::inclusive_scan or thrust::exclusive_scan.

如果您真的想要实施扫描，请参阅最近的文章，例如这个 [1] 。或者对于一个真正的操作，代码更快，但需要更多的研究，这一个 [2] 。或者阅读 Sean Baxter的教程（尽管后者不包括对扫描算法的重要工作的引用）。

If you really want to implement a scan, refer to more recent articles such as this one [1]. Or for a real opus with faster code but that will require a bit more study, this one [2]. Or read Sean Baxter's tutorial (though the latter doesn't include citations of the seminal work on the scan algorithm).

[1] Shubhabrata Sengupta，Mark Harris，Michael Garland和John D. Owens。 Efficient Parallel Scan Algorithms for many-core GPU。在Jakub Kurzak，David A.Bader和Jack Dongarra，editors，Scientific Computing with Multicore and Accelerators，Chapman& Hall / CRC Computational Science，第19章，第413-442页。 Taylor& Francis，January 2011. http://www.idav.ucdavis.edu/publications/print_pub?pub_id=1041

[1] Shubhabrata Sengupta, Mark Harris, Michael Garland, and John D. Owens. "Efficient Parallel Scan Algorithms for many-core GPUs". In Jakub Kurzak, David A. Bader, and Jack Dongarra, editors, Scientific Computing with Multicore and Accelerators, Chapman & Hall/CRC Computational Science, chapter 19, pages 413–442. Taylor & Francis, January 2011. http://www.idav.ucdavis.edu/publications/print_pub?pub_id=1041

[2] Merrill，D。和Grimshaw，A。平行扫描流架构。技术报告CS2009-14，弗吉尼亚大学计算机科学系。 2009年12月。

[2] Merrill, D. and Grimshaw, A. Parallel Scan for Stream Architectures. Technical Report CS2009-14, Department of Computer Science, University of Virginia. Dec. 2009.

这篇关于CONFLICT_FREE_OFFSET宏用于GPU Gems 3的并行前缀算法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

CONFLICT_FREE_OFFSET宏用于GPU Gems 3的并行前缀算法 [英] CONFLICT_FREE_OFFSET macro used in the parallel prefix algorithm from GPU Gems 3

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

CONFLICT_FREE_OFFSET宏用于GPU Gems 3的并行前缀算法 [英] CONFLICT_FREE_OFFSET macro used in the parallel prefix algorithm from GPU Gems 3

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭