翘曲如何使另一次翘曲处于闲置状态? [英] How to a warp cause another warp be in the Idle state?

查看:52
本文介绍了翘曲如何使另一次翘曲处于闲置状态?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如您在问题标题中所看到的,我想知道翘曲如何导致另一次翘曲进入 Idle (空闲)状态.我在SO中阅读了很多Q/A,但找不到答案.在任何时候,一个区块中只能运行一个经纱吗?如果是这样,则经纱的空闲状态没有任何意义,但是如果我们可以同时运行多个经纱,则每个经纱都可以与其他经纱分开进行工作.

As you can see in the title of the question, I want to know how a warp causes another warp go to the Idle state. I read a lot of the Q/A in the SO but I can not find the answer. At any time, just one warp in a block can be run? If so, the idle state of warp has no meaning, but if we can run multiple warps at the same time each warp can do their work separately to other warps.

该论文说:不规则的工作项导致整个经纱处于空闲状态(例如,下图中的warp0 w.r.t. warp1).

The paper said: Irregular work-items lead to whole warps to be in idle state (e.g., warp0 w.r.t. warp1 in the following fig).

推荐答案

Nsight VSE探查器用于翘曲状态的术语在

The terms used by the Nsight VSE profiler for a warp's state are defined at http://docs.nvidia.com/gameworks/index.html#developertools/desktop/nsight/analysis/report/cudaexperiments/kernellevel/issueefficiency.htm. These terms are also used in numerous GTC presentation on performance analysis.

当线程块的所有资源都可用时,计算工作分配器(CWD)将在SM上启动线程块.资源包括:

The compute work distributor (CWD) will launch a thread block on a SM when all resources for the thread block are available. Resources include:

  • 线程块插槽
  • 经线槽(足以容纳该块)
  • 为每个经纱注册
  • 该块的共享内存
  • 障碍的壁垒

当SM具有足够的资源时,将在SM上启动线程块.线程块被栅格化为扭曲.经纱分配给经纱调度程序.资源分配给每个经线.此时,warp处于活动状态,这意味着warp可以执行指令.

When a SM has sufficient resources the thread block is launched on the SM. The thread block is rasterized into warps. Warps are assigned to warp schedulers. Resources are allocated to each warp. At this point a warp is in an active state meaning that warp can executed instructions.

每个周期调度程序在每个周期中从合格的经纱列表中选择(活动的而不是停止的),并为该经纱发出1-2条指令.翘曲可能由于多种原因而停滞.请参阅上面的文档.

On each cycle each warp scheduler selects from a list of eligible warps (active, not stalled) and issues 1-2 instructions for the warp. A warp can become stalled for numerous reasons. See the documentation above.

Kepler-Volta GPU(GP100除外)每个流式多处理器(SM)具有4个warp调度程序(子分区).线程块的所有扭曲都必须在同一SM上.因此,在每个给定的周期内,线程块可能会发出最多4个(子分区)线程束中的翘曲的指令.

Kepler - Volta GPUs (except GP100) have 4 warps schedulers (subpartitions) per streaming multiprocessor (SM). All warps of a thread blocks must be on the same SM. Therefore, on each given cycle a thread block may issue instructions for up to 4 (subpartition) warps in the thread block.

每个翘曲调度器可以在每个周期中选择任何合格的翘曲.SM是流水线式的,因此每个最大线程块(1024个线程== 32个线程束)的所有线程束都可以在每个周期中运行指令.

Each warp scheduler can pick any of the eligible warps each cycle. The SM is pipelined so all warps of a maximum sized thread blocks (1024 threads == 32 warps) can have instructions in flight every cycle.

在没有其他上下文的情况下,我可以确定的唯一空闲定义是:-如果翘曲调度程序有2个合格的翘曲并且选择了1个,则另一个则停滞在称为未选中的状态.-如果线程块中的线程束执行屏障(__syncthreads),则线程束将停滞在屏障层上(不符合条件),直到满足屏障层的要求为止.经线停在障碍物上.

The only definition of idle that I can determine without additional context are: - If a warp scheduler has 2 eligible warps and 1 is selected then the other is stalled in a state called not selected. - If warps in a thread block execute a barrier (__syncthreads) then the warps will stall on the barrier (not eligible) until the requirements of the barrier are met. The warps are stalled on the barrier.

这篇关于翘曲如何使另一次翘曲处于闲置状态?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆