Cuda全局内存加载和存储 [英] Cuda global memory load and store

查看:110
本文介绍了Cuda全局内存加载和存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我试图隐藏全局内存延迟.输入以下代码:

So I am trying to hide global memory latency. Take the following code:

for(int i = 0; i < N; i++){
     x = global_memory[i];

     ... do some computation on x ...

     global_memory[i] = x;
}

我想知道全局内存中的加载和存储是否正在阻塞,即直到加载或存储完成后才运行下一行.例如,使用以下代码:

I wanted to know whether load and store from global memory is blocking, i.e, it doesn't run next line until load or store is finished. For example take the following code:

x_next = global_memory[0];
for(int i = 0; i < N; i++){
     x = x_next;
     x_next = global_memory[i+1];

     ... do some computation on x ...

     global_memory[i] = x;
}

在此代码中,直到下一次迭代时才使用x_next,因此加载x_next与计算重叠吗?换句话说,以下哪个数字将会发生?

In this code, x_next is not used until next iteration, so does loading x_next overlap with the computation? In other words, which of the following figures will happen?

推荐答案

我想知道全局内存中的加载和存储是否处于阻塞状态,即,直到加载或存储完成后,它才在下一行运行.

I wanted to know whether load and store from global memory is blocking, i.e, it doesn't run next line until load or store is finished.

它没有阻止.加载操作不会使线程停顿.

It is not blocking. A load operation does not stall a thread.

请注意,编译器通常会寻求

Note that the compiler will often seek to unroll loops (and reorder activity) to enable what you are proposing to do "manually".

但是无论如何,您的第二个实现应该允许发布 gm [1] 的负载并在对 gm [0] 进行计算的同时继续进行

But in any event your 2nd realization should allow the load of gm[1] to be issued and proceed while the computation being done on gm[0] is proceeding.

全局内存存储也是一劳永逸"的-非阻塞的.

Global memory stores are also "fire and forget" -- nonblocking.

这篇关于Cuda全局内存加载和存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆