可能导致“未定义行为”的原因在此并行GPU代码中? [英] What might cause "Undefined Behaviour" in this parallel GPU code?

查看:145
本文介绍了可能导致“未定义行为”的原因在此并行GPU代码中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们假设core1和core2尝试将变量a和b写入相同的内存位置。



在这里如何解释UB?




  • 我们不知道a或b是否被写入该内存位置(作为最后一个动作)。

  • 我们甚至不知道什么在那里写(一个垃圾)

  • 即使目标内存地址也可能被错误计算(segfault?)。

  • 某些逻辑门会产生错误的电流和CPU禁用自身

  • CPU的频率信息损坏并超频(并破坏自身)



我可以假设只有第一个选项对所有CPU(和GPU)供应商都有效吗?



我刚刚将下面的代码转换为并行GPU代码,并且似乎可以正常工作



通用代码:

 为(j = 0; j< YRES / CELL; j ++)//为此(i = 0; i< XRES / CELL; i ++)//为此被并行化
{
r = fir e_r [j] [i];
g = fire_g [j] [i];
b = fire_b [j] [i];如果(r || g || b)(y = -CELL; y< 2 * CELL; y ++)
(x = -CELL; x< 2 * CELL; x ++)
addpixel(i * CELL + x,j * CELL + y,r,g,b,fire_alpha [y + CELL] [x + CELL]);
// addpixel访问邻居单元的信息并在其上写入
//使UB
r * = 8;
g * = 8;
b * = 8;
对于(y = -1; y< 2; y ++)
对于(x = -1; x< 2; x ++)
如果((x || y)& i + x> = 0&& j + y> = 0&& i + x< XRES / CELL&& j + y< YRES / CELL)
{
r + = fire_r [j + y] [i + x];
g + = fire_g [j + y] [i + x];
b + = fire_b [j + y] [i + x];
}
r / = 16;
g / = 16;
b / = 16;
fire_r [j] [i] = r> 4? r-4:0; // UB
fire_g [j] [i] = g> 4吗? g-4:0; // UB
fire_b [j] [i] = b> 4吗? b-4:0;
}

Opencl:

  int i = get_global_id(0); int j = get_global_id(1); 
int VIDXRES = + std :: to_string(kkVIDXRES)+;
int VIDYRES = + std :: to_string(kkVIDYRES)+;
int XRES = + std :: to_string(kkXRES)+;
int CELL = + std :: to_string(kkCELL)+;
int YRES = + std :: to_string(kkYRES)+;

int x = 0,y = 0,r = 0,g = 0,b = 0,nx = 0,ny = 0;

r = fire_r [j *(XRES / CELL)+ i];
g = fire_g [j *(XRES / CELL)+ i];
b = fire_b [j *(XRES / CELL)+ i];

int counterx = 0;
if(r || g || b)
表示(y = -CELL; y <2 * CELL; y ++){
表示(x = -CELL; x< 2 * CELL; x ++){
addpixel(i * CELL + x,j * CELL + y,r,g,b,fire_alpha [(y + CELL)*(3 * CELL)+( x + CELL)],vid,vido);
}}

r * = 8;
g * = 8;
b * = 8;
for(y = -1; y< 2; y ++){
for(x = -1; x< 2; x ++){
if((x | y)& i + x> = 0&& j + y> = 0 =& i + x< XRES / CELL&& amp; j + y< YRES / CELL)
{
r + = fire_r [(j + y)*(XRES / CELL)+(i + x)];
g + = fire_g [(j + y)*(XRES / CELL)+(i + x)];
b + = fire_b [(j + y)*(XRES / CELL)+(i + x)];
}}}
r / = 16;
g / = 16;
b / = 16;
fire_r [j *(XRES / CELL)+ i] =(r> 4?r-4:0);
fire_g [j *(XRES / CELL)+ i] =(g> 4?g-4:0);
fire_b [j *(XRES / CELL)+ i] =(b> 4?b-4:0);

以下是2D NDrangeKernel局部边界UB的一些罕见伪像的图片。这些可以杀死我的GPU吗?



解决方案

在xf86和xf86_64体系结构上,这表示我们不知道a或b是否写入该内存位置(作为最后一个操作),因为32位(两者)或64位(仅适用于xf86_64)内存对齐的数据类型的加载/存储操作都是原子操作。



通常在其他架构上,我们甚至不知道在那里写了什么(垃圾)是一个有效的答案-当然,在RISC架构上,我目前不知道在GPU上。 / p>

请注意,代码有效的事实并不意味着它是正确的,并且在99%的时间中,它是诸如存在编译器错误,直到以前的版本都可以使用该代码或该代码可在开发机上使用。选择用于生产的服务器已损坏:)



编辑:



在NVidia GPU上,弱序记忆模型。在《 Cuda C编程指南》中的说明中没有明确说明存储操作是原子的。写操作来自同一线程,因此并不意味着加载/存储操作是原子的。


Lets assume core1 and core2 try writing their variables a and b to same memory location.

How can UB be explained here?

  • We dont know if a or b is written to that memory location(as a last action).
  • We dont even know what is written there (a garbage)
  • Even the target memory address can be miscalculated(segfault?).
  • Some logical gates make wrong currents and CPU disables itself
  • CPU's frequency information becomes corrupt and goes high overclock(and break itself)

Can I assume only the first option is valid for all vendors of CPU( and GPU)?

I just converted below code into a parallel GPU code and it seems to be working fine.

Generic code:

for (j=0; j<YRES/CELL; j++) // this is parallelized
        for (i=0; i<XRES/CELL; i++) // this is parallelized
        {
            r = fire_r[j][i];
            g = fire_g[j][i];
            b = fire_b[j][i];
            if (r || g || b)
                for (y=-CELL; y<2*CELL; y++)
                    for (x=-CELL; x<2*CELL; x++)
                        addpixel(i*CELL+x, j*CELL+y, r, g, b, fire_alpha[y+CELL][x+CELL]);
   //addpixel accesses neighbour cells' informations and writes on them
   //and makes UB
            r *= 8;
            g *= 8;
            b *= 8;
            for (y=-1; y<2; y++)
                for (x=-1; x<2; x++)
                    if ((x || y) && i+x>=0 && j+y>=0 && i+x<XRES/CELL && j+y<YRES/CELL)
                    {
                        r += fire_r[j+y][i+x];
                        g += fire_g[j+y][i+x];
                        b += fire_b[j+y][i+x];
                    }
            r /= 16;
            g /= 16;
            b /= 16;
            fire_r[j][i] = r>4 ? r-4 : 0; // UB
            fire_g[j][i] = g>4 ? g-4 : 0; // UB
            fire_b[j][i] = b>4 ? b-4 : 0;
        }

Opencl:

"   int i=get_global_id(0); int j=get_global_id(1);"
"   int VIDXRES="+std::to_string(kkVIDXRES)+";"
                        "   int VIDYRES="+std::to_string(kkVIDYRES)+";"
                        "   int XRES="+std::to_string(kkXRES)+";"
                        "   int CELL="+std::to_string(kkCELL)+";"
                        "   int YRES="+std::to_string(kkYRES)+";"

                        "   int x=0,y=0,r=0,g=0,b=0,nx=0,ny=0;"

                        "       r = fire_r[j*(XRES/CELL)+i];"
                        "       g = fire_g[j*(XRES/CELL)+i];"
                        "       b = fire_b[j*(XRES/CELL)+i];"

                        "       int counterx=0;"
                        "       if (r || g || b)"
                        "       for (y=-CELL; y<2*CELL; y++){"
                        "       for (x=-CELL; x<2*CELL; x++){"
                        "       addpixel(i*CELL+x, j*CELL+y, r, g, b, fire_alpha[(y+CELL)*(3*CELL)+(x+CELL)],vid,vido);"
                        "       }}"

                        "       r *= 8;"
                        "       g *= 8;"
                        "       b *= 8;"
                        "       for (y=-1; y<2; y++){"
                        "       for (x=-1; x<2; x++){"
                        "       if ((x || y) && i+x>=0 && j+y>=0 && i+x<XRES/CELL && j+y<YRES/CELL)"
                        "       {"
                        "           r += fire_r[(j+y)*(XRES/CELL)+(i+x)];"
                        "           g += fire_g[(j+y)*(XRES/CELL)+(i+x)];"
                        "           b += fire_b[(j+y)*(XRES/CELL)+(i+x)];"
                        "       }}}"
                        "       r /= 16;"
                        "       g /= 16;"
                        "       b /= 16;"
                        "       fire_r[j*(XRES/CELL)+i] = (r>4 ? r-4 : 0);"
                        "       fire_g[j*(XRES/CELL)+i] = (g>4 ? g-4 : 0);"
                        "       fire_b[j*(XRES/CELL)+i] = (b>4 ? b-4 : 0);"

Here is picture of some rare artifacts of a 2D NDrangeKernel 's local boundary UB. Can these kill my GPU?

解决方案

On xf86 and xf86_64 architectures it means We dont know if a or b is written to that memory location(as a last action), because load/store operations of 32 (for both) or 64 bit (xf86_64 only) memory aligned datatypes are atomic.

On other architectures usually We dont even know what is written there (a garbage) is a valid answer - for sure on RISC architectures, I currently don't know on GPU's.

Note that The fact the code works doesn't imply that it is correct and in the 99% of the times it's the source of sentences like "there's a compiler bug, the code was working until the previous version" or "the code works on the development machine. The server selected for production is broken" :)

EDIT:

On NVidia GPUs we have weakly-ordered memory model. In the description on the Cuda C Programming guide it's not explicitly stated that store operations are atomic. The write operations come from the same thread, so it does not mean that load/store operations are atomic.

这篇关于可能导致“未定义行为”的原因在此并行GPU代码中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆