CUDA在哪里为堆栈帧分配内核？ [英] Where does CUDA allocate the stack frame for kernels?

查看：202 发布时间：2017/3/4 12:22:07 cuda stack

本文介绍了CUDA在哪里为堆栈帧分配内核？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的内核调用因内存不足而失败。

My kernel call fails with "out of memory". It makes significant usage of the stack frame and I was wondering if this is the reason for its failure.

当使用--ptxas-options = -v调用nvcc时，它会使用print以下配置文件信息：

When invoking nvcc with --ptxas-options=-v it print the following profile information:

    150352 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 59 registers, 40 bytes cmem[0]

硬件：GTX480，sm20,1.5GB设备内存， 48KB共享内存/多处理器。

Hardware: GTX480, sm20, 1.5GB device memory, 48KB shared memory/multiprocessor.

我的问题是堆栈帧的分配：在共享，全局内存，常量内存，..？

My question is where is the stack frame allocated: In shared, global memory, constant memory, ..?

我尝试每个块1个线程，以及每个块32个线程。相同的内存不足。

I tried with 1 thread per block, as well as with 32 threads per block. Same "out of memory".

另一个问题：如果寄存器总数不超过可用的数量，则只能放大驻留在一个多处理器上的线程数寄存器在多处理器（32k为我的卡）。

Another issue: One can only enlarge the number of threads resident to one multiprocessor if the total numbers of registers do not exceed the number of available registers at the multiprocessor (32k for my card). Does something similar apply to the stack frame size?

CUDA在哪里为堆栈帧分配内核？ [英] Where does CUDA allocate the stack frame for kernels?

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

CUDA在哪里为堆栈帧分配内核？ [英] Where does CUDA allocate the stack frame for kernels?

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭