CUDA中的块间同步 [英] Inter-block synchronization in CUDA

查看:407
本文介绍了CUDA中的块间同步的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经搜索了一个月以解决这个问题.我无法在CUDA中同步块.

I've searched a month for this problem. I cannot synchronize blocks in CUDA.

我已经阅读了许多有关atomicAdd,合作组等的文章.我决定使用全局数组,以便一个块可以在全局数组的一个元素上写.写完之后,一个块线程等待(即被困在while循环中),直到所有块都写入全局数组为止.

I've read a lot of posts about atomicAdd, cooperative groups, etc. I decided to use an global array so a block could write on one element of global array. After this writing, a thread of block waits(i.e. trapped in a while loop) until all blocks write global array.

当我使用3个块时,我的同步效果很好(因为我有3个SM).但是使用3个街区可让我占用12%的空间.因此,我需要使用更多的块,但是它们无法同步. 问题是:SM上的一个块等待其他块,因此SM无法获得另一个块.

When I used 3 blocks my synchronization works well (because I have 3 SM). But using 3 blocks gives me 12% occupancy. So I need to use more blocks, but they can't be synchronized. The problem is: a block on a SM waits for other blocks, so the SM can't get another block.

我该怎么办?当块的数量超过SM的数量时,如何同步块?

What can I do? How can synchronize blocks when there are blocks more than the number of SMs?

CUDA-GPU规范:CC. 6.1、3 SM,Windows 10,VS2015,GeForce MX150图形卡. 请帮我解决这个问题.我使用了很多代码,但没有一个起作用.

CUDA-GPU specification: CC. 6.1, 3 SM, windows 10, VS2015, GeForce MX150 graphic card. Please help me for this problem. I used a lot of codes but none of them works.

推荐答案

进行块间同步的CUDA编程模型方法是

The CUDA programming model methods to do inter-block synchronization are

  1. (隐式)使用内核启动自身.在内核启动之前或完成之后,所有(已启动内核中的)块都同步到已知状态.无论从主机代码启动内核还是作为CUDA动态并行启动的一部分,这在概念上都是正确的.

  1. (implicit) Use the kernel launch itself. Before the kernel launch, or after it completes, all blocks (in the launched kernel) are synchronized to a known state. This is conceptually true whether the kernel is launched from host code or as part of CUDA Dynamic Parallelism launch.

(明确)在 CUDA合作小组.这有各种各样的支持要求,您将在适当的属性设置为(cooperativeLaunch).您可以使用 cudaGetDeviceProperties() .

(explicit) Use a grid sync in CUDA Cooperative groups. This has a variety of requirements for support, which you are starting to explore in your other question. The simplest definition of support is if the appropriate property is set (cooperativeLaunch). You can query the property programmatically, using cudaGetDeviceProperties().

这篇关于CUDA中的块间同步的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆