是CUDA warp调度确定的吗? [英] Is CUDA warp scheduling deterministic?

查看:224
本文介绍了是CUDA warp调度确定的吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道CUDA应用程序的warp调度顺序是否是确定性的。

I am wondering if the warp scheduling order of a CUDA application is deterministic.

具体来说,我想知道在同一个设备上使用相同输入数据的多个运行相同内核时,warp执行的顺序是否保持不变。如果没有,是否有什么可以强制顺序执行warp(在调试顺序依赖的算法的情况下)?

Specifically I am wondering if the ordering of warp execution will stay the same with multiple runs of the same kernel with the same input data on the same device. If not, is there anything that could force ordering of warp execution (say in the case when debugging an order dependent algorithm)?

推荐答案

未定义CUDA warp调度的精确行为。因此,你不能依赖于它是确定性的。特别地,如果多个经线准备好在给定的发送槽中执行,则没有对经线调度器将选择哪个线槽的描述。

The precise behavior of CUDA warp scheduling is not defined. Therefore you cannot depend on it being deterministic. In particular, if multiple warps are ready to be executed in a given issue slot, there is no description of which warp will be selected by the warp scheduler(s).

没有外部方法来精确控制弯曲执行的顺序。

There is no external method to precisely control the order of warp execution.

当然可以构建确定warp ID的代码,并强制warp以特定的顺序执行。像这样:

It's certainly possible to build code that determines warp ID, and forces warps to execute in a particular order. Something like this:

#include <stdio.h>

#define N_WARPS 16
#define nTPB (32*N_WARPS)

__device__ volatile int my_next = 0;
__device__ int warp_order[N_WARPS];

__global__ void my_kernel(){

  __shared__ volatile int warp_num;
  unsigned my_warpid = (threadIdx.x & 0x0FE0U)>>5;
  if (!threadIdx.x) warp_num = 0;
  __syncthreads();  // don't use syncthreads() after this point
  while (warp_num != my_warpid);
  // warp specific code here
  if ((threadIdx.x & 0x01F) == 0){
    warp_order[my_next++] = my_warpid;
    __threadfence();
    warp_num++; // release next warp
    } // could use syncthreads() after this point, if more code follows
}


int main(){

  int h_warp_order[N_WARPS];
  for (int i = 0; i < N_WARPS; i++) h_warp_order[i] = -1;
  cudaMemcpyToSymbol(warp_order, h_warp_order, N_WARPS*sizeof(int));
  my_kernel<<<1,nTPB>>>();
  cudaDeviceSynchronize();
  cudaMemcpyFromSymbol(h_warp_order, warp_order, N_WARPS*sizeof(int));
  for (int i = 0; i < N_WARPS; i++) printf("index: %d, warp_id: %d\n", i, h_warp_order[i]);
  return 0;
}

每次只允许一个warp执行效率非常低。

allowing only one warp to execute at a time will be very inefficient, of course.

一般来说,最好的并行化算法很少或没有顺序依赖性。

In general, the best parallelizable algorithms have little or no order dependence.

这篇关于是CUDA warp调度确定的吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆