如何在 CUDA 应用程序中正确应用线程同步? [英] How to properly apply thread synchronization in CUDA app?

查看:31
本文介绍了如何在 CUDA 应用程序中正确应用线程同步?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通常我在我的应用程序中偶尔使用线程同步,因为我并不经常需要这个功能.我不是真正的高级 C/C++ 程序员,但我也不是初学者.与 CPU 的强大功能相比,我开始学习 CUDA C,因为现在 GPU 的强大功能让我兴奋不已,我意识到 CUDA 编程主要是关于并行线程执行,有时需要适当的线程同步.事实上,我什至还不知道如何在 C 或 C++ 中应用线程同步.我最后一次使用同步是大约 2 年前,当时我正在用 Java 编写这样的简单应用程序:

Generally I was using thread synchronization very occasionally in my applications because I didn't need this functionality very often. I'm not really advanced C/C++ programmer, however I'm not a beginner too. I started to learn CUDA C excited by the power of nowadays GPU's in compare to the power of CPU's and what I realized is that CUDA programming is mostly about parallel thread execution and that sometimes proper thread synchronization is necessary. In fact I don't even know how to apply thread synchronization in C or C++ yet. The last time I was using synchronization was about 2 years ago whan I was writing simple apps in Java like this:

synchronized returnType functionName(parameters)
{
    ...
}

什么允许 'functionName' 在一个 tmie 中仅由一个线程执行 - 也就是说,此函数由不同的线程交替执行.现在回到 CUDA C,如果我有例如一个块中有 200 个线程,在 while 循环中运行代码:

what allow 'functionName' to be executed by only one thread at a tmie - that is this function is executed alternately by diffrent threads. Now coming back to CUDA C, if I have e.g. 200 threads in a block which run the code inside while loop:

while(some_condition)
{
    ...
}

如何使线程 <0 - 99> 彼此同步,线程 <100 - 199> 也彼此同步,但以线程 <0 - 99> 和 <100 - 的方式应用同步199> 交替执行(即前 100 个线程运行 'while' 的内容,然后接下来的 100 个线程运行 'while' 的内容,依此类推)?

How can I make threads <0 - 99> synchronized with each other and threads <100 - 199> synchronized with each other too, but apply synchronization the way that threads <0 - 99> and <100 - 199> execute alternately(That is first 100 threads run contents of 'while' and after that next 100 threads run contents of 'while' and so on) ?

推荐答案

我想你可能只需要 了解有关 cuda 的更多信息.您可能陷入了一个陷阱,认为您以前学习的编程范式应该在这里应用.我不确定是不是这样.

I think you may simply need to learn more about cuda. You may be falling into a trap of thinking that a previous programming paradigm that you learned is something that should be applied here. I'm not sure that's the case.

但要回答您的问题,首先让我指出 CUDA 中的线程同步只能在线程块内实现.所以我的评论只适用于那里.

But to answer your question, first let me point out that thread synchronization in CUDA is only possible within a threadblock. So my comments only apply there.

设备代码中的主要同步机制是 __syncthreads().要大致按照您描述的方式使用它,我可以编写如下代码:

The principal sync mechanism in device code is __syncthreads(). To use it roughly along the lines you describe, I could code something like this:

__syncthreads();
if (threadIdx.x < 100){
   // code in this block will only be executed by threads 0-99, all others do nothing
  }
__syncthreads();
if ((threadIdx.x > 99) && (threadIdx.x < 200)){
  // code in this block will only be executed by threads 100-199, all others do nothing
  }
// all threads can begin executing at this point

请注意,即使是线程块中的线程也并非全部同步执行.SM(CUDA GPU 中的线程块处理单元)通常将线程块分成 32 个线程组,称为 warp,这些线程实际上(或多或少)以锁步方式执行.但是,我上面列出的代码仍然具有我所描述的效果,就线程组之间的顺序执行而言,如果您出于某种原因想要这样做的话.

Note that even threads in a threadblock are not all executing in lockstep. The SM (the threadblock processing unit in a CUDA GPU) generally breaks threadblocks into groups of 32 threads called warps and these warps are actually (more or less) executing in lockstep. However the code I listed above still has the effect I describe, in terms of sequencing execution amongst groups of threads, if you wanted to do that for some reason.

这篇关于如何在 CUDA 应用程序中正确应用线程同步?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆