CUDA常量内存是否应该均匀地翘曲访问? [英] Should CUDA Constant Memory be accessed warp-uniformly?

查看:105
本文介绍了CUDA常量内存是否应该均匀地翘曲访问?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的CUDA应用程序具有的恒定内存小于8KB.由于将全部缓存,因此我是否需要担心每个线程都访问同一地址进行优化?

My CUDA application has constant memory of less than 8KB. Since it will all be cached, do I need to worry about every thread accessing the same address for optimization?

如果是,如何确保所有线程同时访问同一地址?

If yes, how do I assure all threads are accessing the same address at the same time?

推荐答案

既然全部都会被缓存,我是否需要担心每个线程都访问相同的地址以进行优化?

Since it will all be cached, do I need to worry about every thread accessing the same address for optimization?

是的. 缓存本身每个周期只能使用一个32位字.

Yes. The cache itself can only serve up one 32-bit word per cycle.

如果是,如何确保所有线程同时访问同一地址?

If yes, how do I assure all threads are accessing the same address at the same time?

请确保您用于引用常量存储区中元素的任何类型的索引编制或寻址都不依赖于任何内置的 thread 变量,例如threadIdx.xthreadIdx.ythreadIdx.z.请注意,实际要求并不严格于此.只要索引对给定扭曲中的每个线程的索引数均相同,就可以实现必要的目标.以下是一些示例:

Ensure that whatever kind of indexing or addressing you use to reference an element in the constant memory area does not depend on any of the built in thread variables, e.g. threadIdx.x, threadIdx.y, or threadIdx.z. Note that the actual requirement is less stringent than this. You can achieve the necessary goal as long as the indexing evaluates to the same number for every thread in a given warp. Here are a few examples:

__constant__ int data[1024];
...
// assume 1D threadblock
int idx = threadIdx.x;
int bidx = blockIdx.x;
int a = data[idx];      // bad - every thread accesses a different element
int b = data[12];       // ok  - every thread accesses the same element
int c = data[b];        // ok  - b is a constant w.r.t threads
int d = data[b + idx];  // bad
int e = data[b + bidx]; // ok
int f = data[idx/32];   // ok - the same element is being accessed per warp

这篇关于CUDA常量内存是否应该均匀地翘曲访问?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆