为什么只有一个经线由cuda执行的SM? [英] Why only one of the warps is executed by a SM in cuda?

查看:358
本文介绍了为什么只有一个经线由cuda执行的SM?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常在某些CUDA资料中找到以下字词:



在任何时候,只有一个经卷由SM执行。



这里我不太明白,因为每个SM可以同时运行几百到几千个线程,为什么只有一个warp,即32个线程,可以在一个时间点执行? / p>

感谢!

解决方案

CUDA硬件的不同代,但是例如在较早的几代中,每个SM具有8个执行单元,每个执行单元执行4个线程(每4个周期来自每个线程的一个指令)。因此,你得到4路SMT,每个SM给出32个并行线程。



当然,每个GPU有多个SM。 30,这意味着在任何给定时刻执行的30×32线程warp = 960个线程。在这个经纱的顶部可以切换进出,所以你可以有更多比,例如。 960活线程,即使其中只有960个在任何给定时间实际执行。


I frequently found the following words in some CUDA materials:

"At any time, only one of the warps is executed by a SM".

Here I don't quite understand since each SM can run hundreds to thousands of threads simultaneously, why only a single warp, which is 32 threads, can be executed at a time point?

Thanks!

解决方案

Details vary for different generations of CUDA hardware, but for example in earlier generations each SM has 8 execution units, each of which executes 4 threads (one instruction from each thread every 4 cycles). Hence you get 4 way SMT which gives 32 concurrent threads per SM.

Of course there are multiple SMs per GPU, e.g. 30, which would mean 30 x 32 thread warps = 960 threads executing at any given instant. On top of this warps can be switched in and out so you can have much more than, e.g. 960 "live" threads, even though only 960 of them are actually executing at any given time.

这篇关于为什么只有一个经线由cuda执行的SM?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆