共享内存优化混乱 [英] shared memory optimization confusion

查看：165 发布时间：2017/3/4 15:14:38 cuda memory-optimization

本文介绍了共享内存优化混乱的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在cuda中写了一个应用程序，它在每个块中使用1kb的共享内存。
由于在每个SM中只有16kb的共享内存，所以只有16个块可以容纳整体（我正确理解吗？），虽然一次只能调度8个，但现在如果一些块忙在进行存储器操作时，其他块将被调度在gpu上，但是所有的共享存储器被已经在那里调度的其他16个块使用，因此cuda将不会在相同的sm上调度更多的块，除非先前分配的块完全完成？或者它会将一些块的共享内存移动到全局内存，并为其分配其他块（在这种情况下我们应该担心全局内存访问延迟吗？）

I have written an application in cuda , which uses 1kb of shared memory in each block. Since there is only 16kb of shared memory in each SM, so only 16 blocks can be accommodated overall ( am i understanding it correctly ?), though at a time only 8 can be scheduled, but now if some block is busy in doing memory operation, so other block will be scheduled on gpu, but all the shared memory is used by other 16 blocks which already been scheduled there, so will cuda will not scheduled more blocks on the same sm , unless previous allocated blocks are completely finished ? or it will move some block's shared memory to global memory, and allocated other block there (in this case should we worry about global memory access latency ?)

共享内存优化混乱 [英] shared memory optimization confusion

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

共享内存优化混乱 [英] shared memory optimization confusion

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭