解释“用于启动的太多资源". [英] Interpretation of "too many resources for launch"

查看:76
本文介绍了解释“用于启动的太多资源".的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下Python代码:

Consider the following Python code:

from numpy import float64
from pycuda import compiler, gpuarray
import pycuda.autoinit

# N > 960 is crucial!
N = 961
code = """
__global__ void kern(double *v)
{
    double a = v[0]*v[2];
    double lmax = fmax(0.0, a), lmin = fmax(0.0, -a);
    double smax = sqrt(lmax),   smin = sqrt(lmin);

    if(smax > 0.2) {
        smax = fmin(smax, 0.2)/smax ;
        smin = (smin > 0.0) ? fmin(smin, 0.2)/smin : 0.0;
        smin = lmin + smin*a;

        v[0] = v[0]*smin + smax*lmax;
        v[2] = v[2]*smin + smax*lmax;
    }
}
"""
kernel_func = compiler.SourceModule(code).get_function("kern")
kernel_func(gpuarray.zeros(3, float64), block=(N,1,1))

执行此操作将得到:

Traceback (most recent call last):
  File "test.py", line 25, in <module>
    kernel_func(gpuarray.zeros(3, float64), block=(N,1,1))
  File "/usr/lib/python3.5/site-packages/pycuda/driver.py", line 402, in function_call
    func._launch_kernel(grid, block, arg_buf, shared, None)
pycuda._driver.LaunchError: cuLaunchKernel failed: too many resources requested for launch

我的设置:在Ubuntu 16.04.1(64位),内核4.4.0,nvcc V7.5.17上具有pycuda == 2016.1.2和numpy == 1.11.1的Python v3.5.2.该显卡是Nvidia GeForce GTX 480.

My setup: Python v3.5.2 with pycuda==2016.1.2 and numpy==1.11.1 on Ubuntu 16.04.1 (64-bit), kernel 4.4.0, nvcc V7.5.17. The graphics card is an Nvidia GeForce GTX 480.

可以在您的机器上重现吗?您有什么主意,是什么导致此错误消息?

Can you reproduce this on your machine? Do you have any idea, what causes this error message?

备注:我知道,原则上讲,存在竞争条件,因为所有内核都试图更改v [0]和v [2].但是无论如何,内核都不应该到达if块的内部!而且,我可以在没有竞争条件的情况下重现该错误,但这要复杂得多.

Remark: I know that, in principle, there is a race condition because all kernels try to change v[0] and v[2]. But the kernels shouldn't reach the inside of the if-block anyway! Moreover, I'm able to reproduce the error without the race condition, but it's much more complicated.

推荐答案

几乎可以肯定,您正在达到每个块的寄存器数限制.

It is almost certain that you are hitting a registers-per-block limit.

阅读相关文档,您的设备每个块最多只能有32k个32位寄存器.当块大小大于960个线程(30个扭曲)时,内核启动需要太多寄存器,并且启动失败. NVIDIA提供了一个Excel电子表格和建议有关如何确定每个线程的内核注册要求以及可用于内核在设备上启动的限制块大小的信息.

Reading the relevant documentation, your device has a limit of 32k 32 bit registers per block. When the block size is larger than 960 threads (30 warps), your kernel launch requires too many registers and the launch fails. NVIDIA supply an excel spreadsheet and advice on how to determine the per thread the register requirement of your kernel and the limiting block sizes you can use for your kernel to launch on your device.

这篇关于解释“用于启动的太多资源".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆