CUDA常驻经纱的问题 [英] Questions of resident warps of CUDA

查看:128
本文介绍了CUDA常驻经纱的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用CUDA了一个月,现在我想弄清楚要隐藏内存访问的延迟需要多少个扭曲/块.我认为这与多处理器上的常驻扭曲最大有关.

I have been using CUDA for a month, now i'm trying to make it clear that how many warps/blocks are needed to hide the latency of memory accesses. I think it is related to the maximum of resident warps on a multiprocessor.

根据CUDA_C_Programming_Guide(v-7.5)中的表13,每个多处理器的最大驻留扭曲数为64. 然后,我的问题是:居民翘曲是什么?它是指那些具有从GPU内存中读取的数据并准备好由SP处理的扭曲吗?或引用可以读取数据存储器的warp或准备好由SP处理的warp,这意味着除那些64条以外的其余warp既无法读取内存也无法被SP处理,直到这64个常驻warp中的一些完成.

According to Table.13 in CUDA_C_Programming_Guide (v-7.5),the maximum of resident warps per multiprocessor is 64. Then, my question is : what is the resident warp? is it refer to those warps with the data read from memory of GPUs and are ready to be processed by SPs? Or refer to either the warps that can read momory for datar or warps that are ready to be processed by SPs,which means that the rest warps except those 64 can neither read memory nor be processed by SPs untill some of those 64 resident warps are done.

推荐答案

常驻扭曲的最大数量是可以在多处理器上并行处理的最大扭曲数量. 通过warp调度程序进行调度并已分配寄存器时,warp处于活动状态.

The maximum amount of resident warp is the maximum number of warps that can be processed in parallel on the multiprocessor. A warp is active when it is scheduled by warp scheduler and registers have been allocated.

如果达到使这种数量的经纱并行运行,则这是理论上的最大占用率(100%或1:1). 如果不是,则入住率较低.

If you achieve to have this amount of warps running in parallel, this the theoretical maximum occupancy (100%, or 1:1). If not, the occupancy ratio is lower.

其他变形将不得不等待.

Other warps will have to wait.

可能与有关SO的问题有关.

针对其他问题的编辑答案:

  1. 经线

关于可以处理的最大扭曲数量:SM(流式多处理器)具有最多的处理核心,而GPU的SM数量有限.即使此网络研讨会并非最新版本最新的架构,它提供了一些很好的例子:

About the maximum amount of warps that can be processed : the SM (streaming multi-processors) have a maximum of processing cores, and the GPU has a limited amount of SMs. Even if this webinar is not up-to-date with new architectures, it gives some good examples :

SM –具有多个处理核心的流式多处理器

SM – Streaming multi-processors with multiple processing cores

每个SM包含32个处理核心

Each SM contains 32 processing cores

以单指令多线程(SIMT)方式执行

Execute in a Single Instruction Multiple Thread (SIMT) fashion

卡上最多16个SM,最多512个计算内核

Up to 16 SMs on a card for a maximum of 512compute cores

然后:

Fermi每个SM(1536线程)最多可以有48个活动经线

Fermi can have up to 48 active warps per SM (1536 threads)

  1. 处理经纱

首先,对于某些术语,它们不一定总是很正式,请参见例如来自Nvidia DevTalk的主题.

First, for some terms they are not always clearly official, see for example this topic from Nvidia DevTalk.

如本主题所述,一旦给定的翘曲已使用其资源分配到SM,则该翘曲就处于活动状态. 然后可以是:

As explained on this topic, a given warp is active once it has been allocated on the SM with its resources. Then it can be :

  • 符合条件:可以发出操作
  • stalled:不能,因为资源/数据依赖性

之所以可行是因为我们那里有一个SIMT体系结构,这意味着单指令多线程.您会发现许多关于此主题的阅读材料,如果您打算调整入住率,这些信息将非常有用.

This is possible because we have a SIMT architecture there, meaning Single Instruction Multiple Threads. You will find lots of readings on this topic that can be very useful if you plan on tweaking occupancy.

这篇关于CUDA常驻经纱的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆