有关CUDA中从块到SM分布的详细信息的问题 [英] A question about the details about the distribution from blocks to SMs in CUDA

查看：197 发布时间：2020/10/13 1:10:23 gpgpu nvidia gpu-programming cuda

本文介绍了有关CUDA中从块到SM分布的详细信息的问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

让我以具有1.3计算能力的硬件为例。

Let me take the hardware with computation ability 1.3 as an example.

有30个SM。然后最多可以同时运行240个块（考虑到寄存器和共享内存的限制，对块数的限制可能要低得多）。超过240的那些块必须等待可用的硬件资源。

30 SMs are available. Then at most 240 blocks are able to be running at the same time(Considering the limit of register and shared memory, the restriction to the number of block may be much lower). Those blocks beyond 240 have to wait for available hardware resources.

我的问题是，何时将超过240的那些块分配给SM。完成前240个模块中的一些块之后？还是当前240个块中的所有完成时？

My question is when those blocks beyond 240 will be assigned to SMs. Once some blocks of the first 240 are completed? Or when all of the first 240 blocks are finished?

我写了这样的一段代码。

I wrote such a piece of code.

#include<stdio.h>
#include<string.h>
#include<cuda_runtime.h>
#include<cutil_inline.h>

const int BLOCKNUM = 1024;
const int N=240;
__global__ void kernel ( volatile int* mark ) {
    if ( blockIdx.x == 0 ) while ( mark[N] == 0 );
    if ( threadIdx.x == 0 ) mark[blockIdx.x] = 1;
}

int main() {
    int * mark;
    cudaMalloc ( ( void** ) &mark, sizeof ( int ) *BLOCKNUM );
    cudaMemset ( mark, 0, sizeof ( int ) *BLOCKNUM );
    kernel <<< BLOCKNUM, 1>>> ( mark );
    cudaFree ( mark );
    return 0;
}

此代码导致死锁，无法终止。但是，如果我将N从240更改为239，则代码可以终止。所以我想知道有关块调度的一些细节。

This code causes a deadlock and fails to terminate. But if I change N from 240 to 239, the code is able to terminate. So I want to know some details about the scheduling of blocks.

有关CUDA中从块到SM分布的详细信息的问题 [英] A question about the details about the distribution from blocks to SMs in CUDA

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

有关CUDA中从块到SM分布的详细信息的问题 [英] A question about the details about the distribution from blocks to SMs in CUDA

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭