pyopenCL,openCL,无法在GPU上构建程序 [英] pyopenCL, openCL, Can't build program on GPU

查看:100
本文介绍了pyopenCL,openCL,无法在GPU上构建程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个内核源代码,可以在PC上的G970上运行,但是无法在具有Iris 6100 1536MB图形的2015年初的MacBook Pro上编译.

I have a piece of kernel source which runs on the G970 on my PC but won't compile on my early 2015 MacBook pro with Iris 6100 1536MB graphic.

platform = cl.get_platforms()[0]
device   = platform.get_devices()[1] # Get the GPU ID
ctx      = cl.Context([device])      # Tell CL to use GPU
queue    = cl.CommandQueue(ctx)      # Create a command queue for the target device.
# program  = cl.Program(ctx, kernelsource).build()
print platform.get_devices() 

此get_devices()显示我在0xffffffff的'Apple'上具有'Intel(R)Core(TM)i5-5287U CPU @ 2.90GHz'>,'Apple上的'Intel(R)Iris(TM)Graphics 6100''为0x1024500.

This get_devices() show I have 'Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz' on 'Apple' at 0xffffffff>, 'Intel(R) Iris(TM) Graphics 6100' on 'Apple' at 0x1024500.

内核将在CPU上正常运行.但是当我在GPU上构建程序时.它返回:

The kernel will run correctly on CPU. But when I build the program on GPU. It returns:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-44-e2b6e1b931de> in <module>()
      3 ctx      = cl.Context([device])      # Tell CL to use GPU
      4 queue    = cl.CommandQueue(ctx)      # Create a command queue for the target device.
----> 5 program  = cl.Program(ctx, kernelsource).build()
      6 
      7 

/usr/local/lib/python2.7/site-packages/pyopencl-2015.2.4-py2.7-macosx-10.11-x86_64.egg/pyopencl/__init__.pyc in build(self, options, devices, cache_dir)
    393                         self._context, self._source, options, devices,
    394                         cache_dir=cache_dir),
--> 395                     options=options, source=self._source)
    396 
    397             del self._context

/usr/local/lib/python2.7/site-packages/pyopencl-2015.2.4-py2.7-macosx-10.11-x86_64.egg/pyopencl/__init__.pyc in _build_and_catch_errors(self, build_func, options, source)
    428         # Python 3.2 outputs the whole list of currently active exceptions
    429         # This serves to remove one (redundant) level from that nesting.
--> 430         raise err
    431 
    432     # }}}

RuntimeError: clbuildprogram failed: BUILD_PROGRAM_FAILURE - 

Build on <pyopencl.Device 'Intel(R) Iris(TM) Graphics 6100' on 'Apple' at 0x1024500>:

Cannot select: 0x7f94b30a5110: i64,ch = dynamic_stackalloc 0x7f94b152a290, 0x7f94b30a4f10, 0x7f94b3092c10 [ORD=7] [ID=54]
  0x7f94b30a4f10: i64 = and 0x7f94b30a4c10, 0x7f94b3092b10 [ORD=7] [ID=52]
    0x7f94b30a4c10: i64 = add 0x7f94b30a6610, 0x7f94b3092a10 [ORD=7] [ID=49]
      0x7f94b30a6610: i64 = shl 0x7f94b3092d10, 0x7f94b3092e10 [ID=46]
        0x7f94b3092d10: i64 = bitcast 0x7f94b30a4810 [ID=41]
          0x7f94b30a4810: v2i32 = IGILISD::MOVSWZ 0x7f94b3092710, 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810 [ID=32]
            0x7f94b3092710: i32,ch = CopyFromReg 0x7f94b152a290, 0x7f94b3092610 [ORD=5] [ID=22]
              0x7f94b3092610: i32 = Register %vreg60 [ORD=5] [ID=1]
            0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
            0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
            0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
        0x7f94b3092e10: i64 = bitcast 0x7f94b30a3f10 [ID=38]
          0x7f94b30a3f10: v2i32 = IGILISD::MOVSWZ 0x7f94b30a4510, 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810 [ID=29]
            0x7f94b30a4510: i32 = Constant<2> [ID=19]
            0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
            0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
            0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
      0x7f94b3092a10: i64 = bitcast 0x7f94b30a4b10 [ID=40]
        0x7f94b30a4b10: v2i32 = IGILISD::MOVSWZ 0x7f94b30a4e10, 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810 [ID=31]
          0x7f94b30a4e10: i32 = Constant<7> [ID=21]
          0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
          0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
          0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
    0x7f94b3092b10: i64 = bitcast 0x7f94b3092910 [ID=39]
      0x7f94b3092910: v2i32 = IGILISD::MOVSWZ 0x7f94b30a5010, 0x7f94b30a4210, 0x7f94b30a2810, 0x7f94b30a2810 [ID=30]
        0x7f94b30a5010: i32 = Constant<-8> [ID=20]
        0x7f94b30a4210: i32 = Constant<-1> [ORD=3] [ID=10]
        0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
        0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
  0x7f94b3092c10: i64 = bitcast 0x7f94b3092810 [ID=35]
    0x7f94b3092810: v2i32 = IGILISD::MOVSWZ 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810 [ID=27]
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
In function: trajectories
(options: -I /usr/local/lib/python2.7/site-packages/pyopencl-2015.2.4-py2.7-macosx-10.11-x86_64.egg/pyopencl/cl)
(source saved as /var/folders/p2/jd7m10gs5k1_q6hx5kvktkcc0000gn/T/tmpWQmCKr.cl)

有人建议为什么不运行?我正在运行Sierra 10.12.5的2015年初MacBook Pro.打印cl.version.VERSION返回2015.2.4

Any suggestion why this won't run? I am running Early 2015 MacBook Pro, Sierra 10.12.5. print cl.version.VERSION return 2015.2.4

这是内核代码:

kernelsource = """
__kernel void trajectories(
    // TODO: adjust argtypes above if this is changed
    const int N,
    const int dim,
    __constant float* data,
    const int nrParticles,
    __global float* pos,
    __global float* vel,
    const int nrSteps,
    __global float* trj, 
    __global float* sigarr, 
    const float sigma, 
    const float mass, 
    const float alpha,  // alpha is resistance in reverse. 
    const float dt
){
    int i,k,step;
    float h, sigsum, hexp; 
    int pidx = get_global_id(0); // global ID used as particle index
    int ofs = pidx * nrSteps * dim;
    int accofs = ofs + (nrSteps-1) * dim; // use last trj point to tmp store acc vector
    float v[dim];
    float sigma2 = sigma*sigma;
    float m = mass / sigma2;
    float dt_over_m = dt /m;
    for(step=0; step<nrSteps; step++){
        for(k=0; k<dim; k++)
        {
            trj[accofs+k]=0;
        }  
        for(i=0; i<N; i++)
        {

            h=0;  // to store ||data[i]-x||**2
            for(k=0; k<dim; k++)
            { 
                v[k] = pos[pidx*dim+k] - data[i*dim + k];
                h += v[k]*v[k];     //h == force1p_sum
            };
            hexp = exp(-h/sigma2)/sigma2;

            for(k=0; k<dim; k++)
            { 
                trj[accofs+k] += -(hexp) * v[k]; 
            };         
        };
        sigsum = 0;
        for(k=0; k<dim; k++)
        { 
            vel[pidx*dim+k]     = alpha * vel[pidx*dim+k] + dt_over_m * trj[accofs+k];      // vel = alpha*vel + acc*dt 
            pos[pidx*dim+k]    += dt * vel[pidx*dim+k];                        // pos = pos + vel*dt
            sigsum             += vel[pidx*dim+k] * vel[pidx*dim+k]; // v^2 for kinetic energy
            trj[ofs+step*dim+k] = pos[pidx*dim+k];             // write to result vector

        };
        sigarr[pidx*nrSteps+step] = sigsum;                    // sig = | vel | 
    }
    for(step=0; step<nrSteps-2; step++)
    {
        sigarr[pidx*nrSteps+step] = sigarr[pidx*nrSteps+step+2] - sigarr[pidx*nrSteps+step+1];
    };
    sigarr[pidx*nrSteps+nrSteps-1] = sigarr[pidx*nrSteps+nrSteps-2] = 0;  

}
"""

谢谢

嘉俊

推荐答案

在这种情况下,您应尝试查询生成错误.在类似的内核代码错误中,您可以做的另一件事是您可以使用脱机编译器.每个OpenCL实施者都有脱机编译器.

You should try to query the error of the build in such cases. Another thing you can do in similar, kernel code errors is that you can use offline compilers. Every OpenCL implementer has offline compiler.

您可以在此处找到英特尔的OpenCL离线编译器:

You can find Intel's OpenCL offline compiler here: https://software.intel.com/en-us/articles/programming-with-the-intel-sdk-for-opencl-applications-development-tools

AMD有一个称为CodeXL的工具,您还可以在其中进行离线编译,以查看您的内核代码是否可以编译.

AMD has a tool called CodeXL, in which you can also do offline compilation to see if your kernel code compiles.

这是ARM OpenCL离线编译器: https://developer.arm.com/products/software-development-tools/graphics-development-tools/mali-offline-compiler/downloads

Here is the ARM OpenCL offline compiler: https://developer.arm.com/products/software-development-tools/graphics-development-tools/mali-offline-compiler/downloads

英特尔最多支持OpenCL 2.1,而ARM最多支持1.1.因此,您可以选择其中任何一个来编译您的内核代码,以轻松发现错误或错误.

Intel's support is up to OpenCL 2.1 while ARM supports up until 1.1. So, you can choose any of them to compile your kernel code to find out bugs or errors easily.

您的内核中的问题是以下行:

The problem in your kernel is the following line:

float v[dim];

OpenCL C规范不允许使用可变长度的数组,并且脱机编译器会出现以下错误:

OpenCL C specification does not allow variable length arrays and the offline compiler gives the following error:

ERROR: <source>:22:12: error: variable length arrays are not supported in OpenCL

您可以修复该行以克服错误,并且从现在开始,您可以检查内核是否可以使用脱机编译器进行编译.

You can fix that line to overcome the error and from now on, you can check if your kernel can be compiled with the offline compiler.

编辑:在规范中,有一个脚注说明了不支持可变长度数组.您可以在这里看到它:

In the specification, there is a footnote that explains the variable length arrays are not supported. You can see it here:

https://www.khronos.org/registry/OpenCL/specs/opencl-2.0-openclc.pdf#page = 31

这篇关于pyopenCL,openCL,无法在GPU上构建程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆