Cuda全局内存加载和存储 [英] Cuda global memory load and store
问题描述
因此,我试图隐藏全局内存延迟.输入以下代码:
So I am trying to hide global memory latency. Take the following code:
for(int i = 0; i < N; i++){
x = global_memory[i];
... do some computation on x ...
global_memory[i] = x;
}
我想知道全局内存中的加载和存储是否正在阻塞,即直到加载或存储完成后才运行下一行.例如,使用以下代码:
I wanted to know whether load and store from global memory is blocking, i.e, it doesn't run next line until load or store is finished. For example take the following code:
x_next = global_memory[0];
for(int i = 0; i < N; i++){
x = x_next;
x_next = global_memory[i+1];
... do some computation on x ...
global_memory[i] = x;
}
在此代码中,直到下一次迭代时才使用x_next,因此加载x_next与计算重叠吗?换句话说,以下哪个数字将会发生?
In this code, x_next is not used until next iteration, so does loading x_next overlap with the computation? In other words, which of the following figures will happen?
推荐答案
我想知道全局内存中的加载和存储是否处于阻塞状态,即,直到加载或存储完成后,它才在下一行运行.
I wanted to know whether load and store from global memory is blocking, i.e, it doesn't run next line until load or store is finished.
它没有阻止.加载操作不会使线程停顿.
It is not blocking. A load operation does not stall a thread.
Note that the compiler will often seek to unroll loops (and reorder activity) to enable what you are proposing to do "manually".
但是无论如何,您的第二个实现应该允许发布 gm [1]
的负载并在对 gm [0]
进行计算的同时继续进行
But in any event your 2nd realization should allow the load of gm[1]
to be issued and proceed while the computation being done on gm[0]
is proceeding.
全局内存存储也是一劳永逸"的-非阻塞的.
Global memory stores are also "fire and forget" -- nonblocking.
这篇关于Cuda全局内存加载和存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!