OpenCL Intel Iris集成图形退出并中止陷阱6:超时问题 [英] OpenCL Intel Iris Integrated Graphics exits with Abort Trap 6: Timeout Issue

查看:56
本文介绍了OpenCL Intel Iris集成图形退出并中止陷阱6:超时问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个使用OpenCL执行蒙特卡洛模拟的程序.我遇到了一个涉及指数的问题.当变量 steps 的值变大,大约为20000时,指数的计算将意外失败,并且程序将退出并显示"Abort Trap:6".鉴于步骤不应该影响内存分配,这似乎是一个奇怪的错误.我尝试将 normal alpha beta 设置为0,但这并不能解决问题,但是注释掉了指数并将其替换为常数1似乎可以解决此问题.我已经在AWS GPU实例上运行了代码,但没有遇到任何问题.是否有人对集成显卡为什么会出现问题有任何想法?

I am attempting to write a program that executes Monte Carlo simulations using OpenCL. I have run into an issue involving exponentials. When the value of the variable steps becomes large, approximately 20000, the calculation of the exponent fails unexpectedly, and the program quits with "Abort Trap: 6". This seems to be a bizarre error given that steps should not affect memory allocation. I have tried setting normal, alpha, and beta to 0 but this does not resolve the problem however commenting out the exponent and replacing it with the constant 1 seems to fix the problem. I have run my code on an AWS GPU instance and it does not run into any issues. Does anybody have any ideas as to why this might be a problem on an integrated graphics card?

解决方案

在较小的范围内多次执行内核,以将内核执行时间保持在5秒以下

Execute the kernel multiple times over a smaller ranges to keep kernel execution time under 5 seconds

代码段

#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif

static uint MWC64X(uint2 *state) {
  enum { A = 4294883355U };
  uint x = (*state).x, c = (*state).y; 
  uint res = x ^ c;                    
  uint hi = mul_hi(x, A);              
  x = x * A + c;
  c = hi + (x < c);
  *state = (uint2)(x, c); 
  return res;          
}

__kernel void discreteMonteCarloKernel(...) {

  float cumulativeWalk = stockPrice;
  float currentValue = stockPrice;

  ...

  uint n = get_global_id(0);
  uint2 seed2 = (uint2)(n, seed);
  uint random1 = MWC64X(&seed2);
  uint2 seed3 = (uint2)(random1, seed);
  uint random2 = MWC64X(&seed3);

  float alpha = (interestRate - 0.5 * sigma * sigma) * dt;
  float beta = sigma * sqrt(dt);

  float u1;
  float u2;
  float a;
  float b;
  float normal;

  for (int j = 0; j < steps; j++) {

    random1 = MWC64X(&seed2);

    if (random1 == 0) {
      random1 = MWC64X(&seed2);
    }

    random2 = MWC64X(&seed3);

    u1 = (float)random1 / (float)0xffffffff;
    u2 = (float)random2 / (float)0xffffffff;

    a = sqrt(-2 * log(u1));
    b = 2 * M_PI * u2;

    normal = a * sin(b);

    exponent = exp(alpha + beta * normal);
    currentValue = currentValue * exponent;
    cumulativeWalk += currentValue;

    ...
}

问题报告

Exception Type:        EXC_CRASH (SIGABRT)
Exception Codes:       0x0000000000000000, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Application Specific Information:
abort() called

Application Specific Signatures:
Graphics hardware encountered an error and was reset: 0x00000813


Thread 0 Crashed:: Dispatch queue: opencl_runtime
0   libsystem_kernel.dylib          0x00007fffb14bad42 __pthread_kill + 10
1   libsystem_pthread.dylib         0x00007fffb15a85bf pthread_kill + 90
2   libsystem_c.dylib               0x00007fffb1420420 abort + 129
3   libGPUSupportMercury.dylib      0x00007fffa98e6fbf gpusGenerateCrashLog + 158
4   com.apple.driver.AppleIntelHD5000GraphicsGLDriver   0x000000010915f13b gpusKillClientExt + 9
5   libGPUSupportMercury.dylib      0x00007fffa98e7983 gpusQueueSubmitDataBuffers + 168
6   com.apple.driver.AppleIntelHD5000GraphicsGLDriver   0x00000001091aa031 IntelCLCommandBuffer::getNew(GLDQueueRec*) + 31
7   com.apple.driver.AppleIntelHD5000GraphicsGLDriver   0x00000001091a9f99 intelSubmitCLCommands(GLDQueueRec*, unsigned int) + 65
8   com.apple.driver.AppleIntelHD5000GraphicsGLDriver   0x00000001091b00a1 CHAL_INTEL::ChalContext::ChalFlush() + 83
9   com.apple.driver.AppleIntelHD5000GraphicsGLDriver   0x00000001091aa2c3 gldFinishQueue + 43
10  com.apple.opencl                0x00007fff9ffeeb37 0x7fff9ffed000 + 6967
11  com.apple.opencl                0x00007fff9ffef000 0x7fff9ffed000 + 8192
12  com.apple.opencl                0x00007fffa000ccca 0x7fff9ffed000 + 130250
13  com.apple.opencl                0x00007fffa001029d 0x7fff9ffed000 + 144029
14  libdispatch.dylib               0x00007fffb13568fc _dispatch_client_callout + 8
15  libdispatch.dylib               0x00007fffb1357536 _dispatch_barrier_sync_f_invoke + 83
16  com.apple.opencl                0x00007fffa001011d 0x7fff9ffed000 + 143645
17  com.apple.opencl                0x00007fffa000bda6 0x7fff9ffed000 + 126374
18  com.apple.opencl                0x00007fffa00011df clEnqueueReadBuffer + 813
19  simplisticComparison            0x0000000107b953cf BinomialMultiplication::execute(int) + 1791
20  simplisticComparison            0x0000000107b9ec7f main + 767
21  libdyld.dylib                   0x00007fffb138c235 start + 1

Thread 1:
0   libsystem_pthread.dylib         0x00007fffb15a50e4 start_wqthread + 0
1   ???                             0x000070000eed6b30 0 + 123145552751408

Thread 2:
0   libsystem_pthread.dylib         0x00007fffb15a50e4 start_wqthread + 0

Thread 3:
0   libsystem_pthread.dylib         0x00007fffb15a50e4 start_wqthread + 0
1   ???                             0x007865646e496d65 0 + 33888479226719589

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x0000000000000006  rcx: 0x00007fff58074078  rdx: 0x0000000000000000
  rdi: 0x0000000000000307  rsi: 0x0000000000000006  rbp: 0x00007fff580740a0  rsp: 0x00007fff58074078
   r8: 0x0000000000000000   r9: 0x00007fffb140ba50  r10: 0x0000000008000000  r11: 0x0000000000000206
  r12: 0x00007f92de80a7e0  r13: 0x00007f92e0008c00  r14: 0x00007fffba29e3c0  r15: 0x00007f92de801a00
  rip: 0x00007fffb14bad42  rfl: 0x0000000000000206  cr2: 0x00007fffba280128

Logical CPU:     0
Error Code:      0x02000148
Trap Number:     133

推荐答案

我有一个猜测.驱动程序可以通过两种方式崩溃:

I have a guess. The driver can crash in two ways:

  1. 我们引用了错误的缓冲区地址.这可能不是您的情况.
  2. 我们超时(超出了TDR).内核需要几秒钟才能完成.

我的钱在#2上.如果较大的值(步长)使GPU运行时间过长,则系统将终止运行.

My money is on #2. If the larger value (steps) makes the GPU run too long, the system will kill things.

我不熟悉Apple的Intel驱动程序,但是通常在极端情况下可以禁用TDR.例如.请参见 Windows文档在TDR上获得要点.(Linux驱动程序也可以禁用此功能.)

I am not familiar with the guts of Apple's Intel driver, but typically there is a way to disable the TDR in extreme cases. E.g. see the Windows Documenation on TDRs to get the gist. (Linux drivers have a way to disable this too.)

通常,我们希望避免运行耗时超长的事情,最好以某种方式分解工作负载,以使您自然不会遇到此致命问题.例如.也许将步骤"分成较小的块(传入并保存您无法重新计算的部分的状态).

Normally we want to avoid running things that take super long and it might be a good idea to decompose the workload in some way so that you naturally don't hit this kill switch. E.g. perhaps chunk the "steps" into smaller chunks (pass in and save your state for parts you can't recompute).

这篇关于OpenCL Intel Iris集成图形退出并中止陷阱6:超时问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆