CUDA请求启动的资源过多 [英] CUDA Too many resources requested for launch

查看：437 发布时间：2020/10/13 0:45:31 c cuda

本文介绍了CUDA请求启动的资源过多的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在具有Compute Capability 2.0的GTX 480上运行代码时，我会遇到一些问题

I have some problems running my code on a GTX 480 with Compute Capability 2.0

如果我启动每个块具有1024个线程的内核，我总是会遇到以下错误：

I always get following error if I launch the kernel with 1024 threads per Block:

========= CUDA-MEMCHECK
========= Program hit cudaErrorLaunchOutOfResources (error 7) due to "too many resources requested for launch" on CUDA API call to cudaLaunch.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2ef613]
=========     Host Frame:/usr/local/cuda-6.5/lib64/libcudart.so.6.5 (cudaLaunch + 0x17e) [0x3686e]
=========     Host Frame:./bin/myProgram [0x3a50]
=========     Host Frame:./bin/myProgram [0x388a]
=========     Host Frame:./bin/myProgram [0x38e3]
=========     Host Frame:./bin/myProgram [0x2a99]
=========     Host Frame:./bin/myProgram [0x1410]
=========     Host Frame:./bin/myProgram [0x1da0]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xed) [0x2176d]
=========     Host Frame:./bin/myProgram [0x1139]
=========

我运行程序多次，具有不同的块和线程数：

I run the program multiple time with different block and thread count:

5 Blocks, 512 Threads per Block => Works
5 Blocks, 1024 Threads per Block => Error
10 Blocks, 512 Threads per Block => Works
10 Blocks, 1024 Threads per Block => Error
15 Blocks, 512 Threads per Block => Works
15 Blocks, 1024 Threads per Block => Error

我检查了使用的寄存器，这似乎还可以。具有28个寄存器的 Function4是使用大量线程的内核。其他所有kernerls每次调用仅使用<< 1，32 >>>。

I checked the used registers, and it seems to be ok. "Function4" with 28 registers is the kernel which uses so much threads. All other kernerls uses only <<<1, 32>>> per call.

ptxas info    : 0 bytes gmem
ptxas info    : Function properties for _Z7function1Py
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Compiling entry function '_Z13function2PyS_i' for 'sm_20'
ptxas info    : Function properties for _Z13function2PyS_i
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 22 registers, 52 bytes cmem[0]
ptxas info    : Compiling entry function '_Z6function3PyiS_' for 'sm_20'
ptxas info    : Function properties for _Z6function3PyiS_
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 22 registers, 56 bytes cmem[0]
ptxas info    : Compiling entry function '_Z17function4PyiiS_Phji' for 'sm_20'
ptxas info    : Function properties for _Z17function4PyiiS_Phji
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 28 registers, 72 bytes cmem[0]

I我的GTX 660也可以在CC 3.0上运行该程序，并且每个块可以使用1024个线程。我不知道问题出在哪里。有人知道吗？

I run this program with my GTX 660 too with CC 3.0 and there it works with 1024 Threads per Block. I have no clue where the problem come from. Has anyone an idea?

CUDA请求启动的资源过多 [英] CUDA Too many resources requested for launch

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

CUDA请求启动的资源过多 [英] CUDA Too many resources requested for launch

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭