cuda共享内存和块执行计划 [英] cuda shared memory and block execution scheduling

查看：142 发布时间：2020/8/6 18:55:18 cuda shared-memory warp-scheduler

本文介绍了cuda共享内存和块执行计划的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想根据每个块使用的共享内存量来清除 CUDA共享内存和块执行的执行状态.

I would like to clear up an execution state with CUDA shared memory and block execution based on the amount of shared memory used per block.

我的目标是GTX480 nvidia卡，该卡每块具有48KB 共享内存，并具有15个流式多处理器.因此，如果我声明一个有15个块的内核，则每个块都使用48KB共享内存，并且没有达到其他限制(寄存器，每个块的最大线程数等)，每个块都运行到一个SM(共15个)中，直至结束.在这种情况下，只需要在相同块的扭曲之间进行调度即可.

I target on GTX480 nvidia card which has 48KB shared memory per block and 15 streaming multiprocessors. So, if i declare a kernel with 15 blocks, each one uses 48KB of shared memory and no other restriction is reached (registers, maximum threads per block etc.) every block is running into one SM(of 15) until the end. In this case is needed only scheduling between warps of the same block.

所以，我的误解是:
我将内核称为30个块，以便每个SM上驻留2个块.现在，每个SM上的 scheduler 必须处理来自不同块的扭曲.但是仅当一个块完成执行时，另一个块的扭曲才会在SM上执行，这是因为共享内存总量(每个SM 48KB)的使用.如果没有发生这种情况，并且不同块的扭曲调度在同一SM上执行，那么结果可能是错误的，因为一个块可以读取从另一个块中加载的值到共享内存中.我说的对吗?

So, my misunderstanding scenario is:
I call a kernel with 30 blocks so that 2 blocks reside on each SM. Now scheduler on each SM have to deal with warps from different blocks. But only when one block finishes its execution, warps of the other block is executed on SM because of shared memory entire amount (48KB per SM) usage. If this doesn't happen and warps of different blocks scheduling for execution on the same SM the result may be wrong because one block can read values loaded from the other in shared memory. Am i right?

cuda共享内存和块执行计划 [英] cuda shared memory and block execution scheduling

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

cuda共享内存和块执行计划 [英] cuda shared memory and block execution scheduling

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭