CUDA 在哪里为内核分配堆栈帧? [英] Where does CUDA allocate the stack frame for kernels?

查看:15
本文介绍了CUDA 在哪里为内核分配堆栈帧?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的内核调用因内存不足"而失败.它大量使用了堆栈帧,我想知道这是否是它失败的原因.

My kernel call fails with "out of memory". It makes significant usage of the stack frame and I was wondering if this is the reason for its failure.

使用 --ptxas-options=-v 调用 nvcc 时,它会打印以下配置文件信息:

When invoking nvcc with --ptxas-options=-v it print the following profile information:

    150352 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 59 registers, 40 bytes cmem[0]

硬件:GTX480、sm20、1.5GB 设备内存、48KB 共享内存/多处理器.

Hardware: GTX480, sm20, 1.5GB device memory, 48KB shared memory/multiprocessor.

我的问题是堆栈帧在哪里分配:在共享、全局内存、常量内存中,..?

My question is where is the stack frame allocated: In shared, global memory, constant memory, ..?

我尝试了每个块 1 个线程,以及每个块 32 个线程.同样的内存不足".

I tried with 1 thread per block, as well as with 32 threads per block. Same "out of memory".

另一个问题:如果寄存器总数不超过多处理器上可用寄存器的数量(我的卡为 32k),则只能扩大驻留在一个多处理器上的线程数.类似的东西是否适用于堆栈帧大小?

Another issue: One can only enlarge the number of threads resident to one multiprocessor if the total numbers of registers do not exceed the number of available registers at the multiprocessor (32k for my card). Does something similar apply to the stack frame size?

推荐答案

堆栈分配在本地内存中.分配是每个物理线程(GTX480:15 SM * 1536 线程/SM = 23040 线程).您请求 150,352 字节/线程 => ~3.4 GB 的堆栈空间.如果大小那么高,CUDA 可能会减少每次启动的最大物理线程数.CUDA 语言并非设计为每个线程堆栈都很大.

Stack is allocated in local memory. Allocation is per physical thread (GTX480: 15 SM * 1536 threads/SM = 23040 threads). You are requesting 150,352 bytes/thread => ~3.4 GB of stack space. CUDA may reduce the maximum physical threads per launch if the size is that high. The CUDA language is not designed to have a large per thread stack.

在寄存器方面,GTX480 被限制为每个线程 63 个寄存器和每个 SM 32K 寄存器.

In terms of registers GTX480 is limited to 63 registers per thread and 32K registers per SM.

这篇关于CUDA 在哪里为内核分配堆栈帧?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆