CUDA内核的并行执行 [英] parallel execution of CUDA kernels

查看:77
本文介绍了CUDA内核的并行执行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下伪代码,其中a和b是GPU阵列.进行了三个Cuda内核调用.

Consider following pseudo-code, where a and b are GPU arrays. Three Cuda kernel calls are made.

square<<<N,M>>>(a, length);
cube<<<N,M>>>(b,length);
add<<<N,M>>>(a,b,length);

  1. 平方一个数字
  2. 求每个b的数量
  3. 添加a和b的对应元素

是否有可能在平方和立方内核完成之前执行添加内核并读取a和b的旧值?

Is it possible that before square and cube kernels are finished, add kernel gets executed and it reads older values of a and b?

推荐答案

是否有可能在平方和立方内核完成之前执行添加内核并读取a和b的旧值?

Is it possible that before square and cube kernels are finished, add kernel gets executed and it reads older values of a and b?

不是您所写的.在CUDA中,活动以流的形式流动.流是执行的有序路径.基本流语义表示,对于发布到流中的2个项目(即发布到相同流中的两者),这些项目将按发布顺序执行.在第1项之后执行的第2项直到第1项完成执行后才开始执行.CUDA流强制执行此操作.

Not as you have written it. In CUDA, activity flows in streams. Streams are ordered paths of execution. Basic stream semantics say that for 2 items issued into a stream (i.e. both issued into the same stream), those items will execute in issue order. Item 2, issued after item 1, will not begin execution until item 1 has completed execution. CUDA streams enforce this.

流的另一个特征是,即使没有显式标识流,对于所有可进行流的活动(包括可采用流参数的任何内容),您都将使用NULL(或默认)流.您的内核启动可以使用流参数.由于您省略了这一点:

Another characteristic of streams is that even if you don't explicitly identify a stream, you are using the NULL (or default) stream, for all stream-able activity, which includes anything that can take a stream parameter. Your kernel launches can take a stream parameter. Since you have omitted this:

square<<<N,M>>>(a, length);
            ^
            no stream parameter

您正在使用NULL流(对于您的所有3次启动),CUDA流语义表明这些内核将被序列化.

you are using the NULL stream (for all 3 of your launches) and CUDA stream semantics dictate that those kernels will be serialized.

整个文档部分对理解并发性很有帮助,并且通过研究CUDA并发内核样本代码,您可以了解见证内核并发性的一些要求.

This entire doc section will be useful reading to understand concurrency, and you can get an idea of some of the requirements to witness kernel concurrency by studying the CUDA concurrentKernels sample code.

这篇关于CUDA内核的并行执行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆