beignet上的OpenCL超时不会引发错误吗? [英] OpenCL timeout on beignet doesnt raise error?

查看:76
本文介绍了beignet上的OpenCL超时不会引发错误吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我运行以下(简化)代码,该代码将运行简化内核几秒钟,然后检查结果.前40万个左右的结果是正确的,然后下一个全为零.内核应将相同的值(4228)放入450万个元素的输出数组的每个元素中.看起来好像不知何故,某处某事正在超时,或者没有同步,但是我有点困惑,因为我:

I run the following (simplified) code, which runs a simplified kernel for a few seconds, and then checks the results. The first 400,000 or so results are correct, and then the next are all zero. The kernel should put the same value (4228) into each element of the output array of 4.5 million elements. It looks like somehow, somewhere, something is timing out, or not being synchronized, but I'm a bit puzzled, since I:

  • 甚至叫作clFinish,只是为了确保
  • 正在检查所有错误,并且没有返回错误

结果如下:

user@pear:~/git/machinelearning/prototyping/build$ ./testcltimeout 
out[442496] != 4228: 0

我希望发生的事情是:代码应该可以完全运行,并且没有错误.

What I expect to happen is: code should just run to completion, with no errors.

上下文:运行于:

  • beignet,OpenCL 1.2
  • Intel HD 4000集成显卡

内核是:

kernel void test_read( const int one,  const int two, global int *out) {
    const int globalid = get_global_id(0);
    int sum = 0;
    int n = 0;
    while( n < 100000 ) {
        sum = (sum + one ) % 1357 * two;
        n++;
    }
    out[globalid] = sum;
}

测试代码(我已经尽可能简化了...)

Test code (I've simplified this as much as possible...)

#include <iostream>
#include <sstream>
#include <stdexcept>
using namespace std;

#include "CL/cl.hpp"

template<typename T>
std::string toString(T val ) {
   std::ostringstream myostringstream;
   myostringstream << val;
   return myostringstream.str();
}

void checkError( cl_int error ) {
    if (error != CL_SUCCESS) {
       throw std::runtime_error( "Error: " + toString(error) );
    }
}

int main( int argc, char *argv[] ) {

     cl_int error;  

    cl_device_id *device_ids;

    cl_uint num_platforms;
    cl_uint num_devices;

    cl_platform_id platform_id;
    cl_device_id device;

    cl_context context;
    cl_command_queue queue;
    cl_program program;

    checkError( clGetPlatformIDs(1, &platform_id, &num_platforms) );
    checkError(  clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU, 1, &device, &num_devices) );
    device_ids = new cl_device_id[num_devices];
    checkError( clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU, num_devices, device_ids, &num_devices) );
    device = device_ids[0];
    context = clCreateContext(0, 1, &device, NULL, NULL, &error);
    checkError(error);
    queue = clCreateCommandQueue(context, device, 0, &error);
    checkError(error);

    string kernel_source = string( "kernel void test_read( const int one,  const int two, global int *out) {\n" ) +
    "    const int globalid = get_global_id(0);\n" +
    "    int sum = 0;\n" +
    "    int n = 0;\n" +
    "    while( n < 100000 ) {\n" +
    "        sum = (sum + one ) % 1357 * two;\n" +
    "        n++;\n" +
    "    }\n" +
    "    out[globalid] = sum;\n" +
    "}\n";
    const char *source_char = kernel_source.c_str();
    size_t src_size = strlen( source_char );
    program = clCreateProgramWithSource(context, 1, &source_char, &src_size, &error);
    checkError(error);

    checkError( clBuildProgram(program, 1, &device, 0, NULL, NULL) );

    cl_kernel kernel = clCreateKernel(program, "test_read", &error);
    checkError(error);

    const int N = 4500000;
    int *out = new int[N];
    if( out == 0 ) throw runtime_error("couldnt allocate array");

    int c1 = 3;
    int c2 = 7;
    checkError( clSetKernelArg(kernel, 0, sizeof(int), &c1 ) );
    checkError( clSetKernelArg(kernel, 1, sizeof(int), &c2 ) );
    cl_mem outbuffer = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(int) * N, 0, &error);
    checkError(error);
    checkError( clSetKernelArg(kernel, 2, sizeof(cl_mem), &outbuffer) );

    size_t globalSize = N;
    size_t workgroupsize = 512;
    globalSize = ( ( globalSize + workgroupsize - 1 ) / workgroupsize ) * workgroupsize;
    checkError( clEnqueueNDRangeKernel( queue, kernel, 1, NULL, &globalSize, &workgroupsize, 0, NULL, NULL) );
    checkError( clFinish( queue ) );
    checkError( clEnqueueReadBuffer( queue, outbuffer, CL_TRUE, 0, sizeof(int) * N, out, 0, NULL, NULL) );    
    checkError( clFinish( queue ) );

    for( int i = 0; i < N; i++ ) {
       if( out[i] != 4228 ) {
           cout << "out[" << i << "] != 4228: " << out[i] << endl;
           exit(-1);
       }
    }

    return 0;
}

推荐答案

您的内核似乎很长.我怀疑您正在TDR(定时退出)并且Linux(Beignet)比Windows更安静地处理此问题.因此,我有两个想法.

You're kernel seems to be pretty long. I suspect you are TDR'ing (timing out) out and Linux (Beignet) handles this more silently than Windows. Hence, I have a couple ideas.

  • 检查dmesg是否有TDR消息.我没有使用Beignet或Linux OpenCL实现,但是 Beignet文档页面(在已知问题"下)表示您可以通过dmesg进行检查.
  • Check dmesg for a TDR message. I haven't used Beignet or a Linux OpenCL implementation for that matter, but the Beignet documentation page (under "known issues") indicates you can check this via dmesg.

要检查GPU是否挂起,可以执行dmesg并检查是否 它具有以下消息:[17909.175965] [drm:i915_hangcheck_hung] ERROR 挂起检查计时器已过...如果已挂起,则表明GPU挂起.通常,这意味着内核中有错误,因为它表明 OCL内核尚未完成约6秒钟甚至更长的时间.

To check whether GPU hang, you could execute dmesg and check whether it has the following message: [17909.175965] [drm:i915_hangcheck_hung] ERROR Hangcheck timer elapsed... If it does, there was a GPU hang. Usually, this means something wrong in the kernel, as it indicates the OCL kernel hasn't finished for about 6 seconds or even more.

文档继续说,如果您真的知道内核需要更长的时间才能完成,则可以禁用超时检查,但是警告您可能会挂起计算机.

The documentation goes on to say that you can disable the timeout check if you really know the kernel is just taking longer to finish, but warns that you risk a machine hang.

  • 在Windows上的Intel HD 4000图形版上尝试.如果内核花费的时间超过几秒钟,它将超时并且驱动程序实际上崩溃(但会自动重启).

  • Try the on Intel HD 4000 Graphics on Windows. If the kernel takes longer than a few seconds, it will time out and the driver actually crashes (but auto restarts).

尝试使用Intel OpenCL CPU实现(或任何其他没有TRD限制的实现)的内核.检查正确性和运行时间(10秒?10分钟?).我认为CPU实现不会超时.

Try the kernel with the Intel OpenCL CPU implementation (or any other without a TRD limit). Check for correctness and the length that it runs in (10 seconds? 10 minutes?). I don't think the CPU implementation will time out.

这篇关于beignet上的OpenCL超时不会引发错误吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆