可能导致“未定义行为”的原因在此并行GPU代码中？ [英] What might cause "Undefined Behaviour" in this parallel GPU code?

查看：145 发布时间：2020/10/13 1:37:31 cuda undefined-behavior

本文介绍了可能导致“未定义行为”的原因在此并行GPU代码中？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

让我们假设core1和core2尝试将变量a和b写入相同的内存位置。

在这里如何解释UB？

我们不知道a或b是否被写入该内存位置（作为最后一个动作）。

我们甚至不知道什么在那里写（一个垃圾）

即使目标内存地址也可能被错误计算（segfault？）。

某些逻辑门会产生错误的电流和CPU禁用自身

CPU的频率信息损坏并超频（并破坏自身）

我可以假设只有第一个选项对所有CPU（和GPU）供应商都有效吗？

我刚刚将下面的代码转换为并行GPU代码，并且似乎可以正常工作

通用代码：

 为（j = 0; j< YRES / CELL; j ++）//为此（i = 0; i< XRES / CELL; i ++）//为此被并行化
 {
r = fir e_r [j] [i]; 
 g = fire_g [j] [i]; 
 b = fire_b [j] [i];如果（r || g || b）（y = -CELL; y< 2 * CELL; y ++）
（x = -CELL; x< 2 * CELL; x ++）
 addpixel（i * CELL + x，j * CELL + y，r，g，b，fire_alpha [y + CELL] [x + CELL]）; 
 // addpixel访问邻居单元的信息并在其上写入
 //使UB 
 r * = 8; 
 g * = 8; 
 b * = 8; 
对于（y = -1; y< 2; y ++）
对于（x = -1; x< 2; x ++）
如果（（x || y）& i + x> = 0&& j + y> = 0&& i + x< XRES / CELL&& j + y< YRES / CELL）
 {
r + = fire_r [j + y] [i + x]; 
 g + = fire_g [j + y] [i + x]; 
 b + = fire_b [j + y] [i + x]; 
} 
 r / = 16; 
 g / = 16; 
 b / = 16; 
 fire_r [j] [i] = r> 4？ r-4：0； // UB 
 fire_g [j] [i] = g> 4吗？ g-4：0； // UB 
 fire_b [j] [i] = b> 4吗？ b-4：0； 
}

Opencl：

  int i = get_global_id（0）; int j = get_global_id（1）; 
 int VIDXRES = + std :: to_string（kkVIDXRES）+; 
 int VIDYRES = + std :: to_string（kkVIDYRES）+; 
 int XRES = + std :: to_string（kkXRES）+; 
 int CELL = + std :: to_string（kkCELL）+; 
 int YRES = + std :: to_string（kkYRES）+; 
 
 int x = 0，y = 0，r = 0，g = 0，b = 0，nx = 0，ny = 0; 
 
 r = fire_r [j *（XRES / CELL）+ i]; 
 g = fire_g [j *（XRES / CELL）+ i]; 
 b = fire_b [j *（XRES / CELL）+ i]; 
 
 int counterx = 0; 
 if（r || g || b） 
表示（y = -CELL; y <2 * CELL; y ++）{ 
表示（x = -CELL; x< 2 * CELL; x ++）{ 
 addpixel（i * CELL + x，j * CELL + y，r，g，b，fire_alpha [（y + CELL）*（3 * CELL）+（ x + CELL）]，vid，vido）; 
}} 
 
 r * = 8; 
 g * = 8; 
 b * = 8; 
 for（y = -1; y< 2; y ++）{ 
 for（x = -1; x< 2; x ++）{ 
 if（（x | y）& i + x> = 0&& j + y> = 0 =& i + x< XRES / CELL&& amp; j + y< YRES / CELL） 
 { 
 r + = fire_r [（j + y）*（XRES / CELL）+（i + x）]; 
 g + = fire_g [（j + y）*（XRES / CELL）+（i + x）]; 
 b + = fire_b [（j + y）*（XRES / CELL）+（i + x）]; 
}}} 
 r / = 16; 
 g / = 16; 
 b / = 16; 
 fire_r [j *（XRES / CELL）+ i] =（r> 4？r-4：0）; 
 fire_g [j *（XRES / CELL）+ i] =（g> 4？g-4：0）; 
 fire_b [j *（XRES / CELL）+ i] =（b> 4？b-4：0）;

以下是2D NDrangeKernel局部边界UB的一些罕见伪像的图片。这些可以杀死我的GPU吗？

解决方案

在xf86和xf86_64体系结构上，这表示我们不知道a或b是否写入该内存位置（作为最后一个操作），因为32位（两者）或64位（仅适用于xf86_64）内存对齐的数据类型的加载/存储操作都是原子操作。

通常在其他架构上，我们甚至不知道在那里写了什么（垃圾）是一个有效的答案-当然，在RISC架构上，我目前不知道在GPU上。 / p>

请注意，代码有效的事实并不意味着它是正确的，并且在99％的时间中，它是诸如存在编译器错误，直到以前的版本都可以使用该代码或该代码可在开发机上使用。选择用于生产的服务器已损坏：）

编辑：

在NVidia GPU上，弱序记忆模型。在《 Cuda C编程指南》中的说明中没有明确说明存储操作是原子的。写操作来自同一线程，因此并不意味着加载/存储操作是原子的。

Lets assume core1 and core2 try writing their variables a and b to same memory location.

How can UB be explained here?

We dont know if a or b is written to that memory location(as a last action).
We dont even know what is written there (a garbage)
Even the target memory address can be miscalculated(segfault?).
Some logical gates make wrong currents and CPU disables itself
CPU's frequency information becomes corrupt and goes high overclock(and break itself)

Can I assume only the first option is valid for all vendors of CPU( and GPU)?

I just converted below code into a parallel GPU code and it seems to be working fine.

Generic code:

for (j=0; j<YRES/CELL; j++) // this is parallelized
        for (i=0; i<XRES/CELL; i++) // this is parallelized
        {
            r = fire_r[j][i];
            g = fire_g[j][i];
            b = fire_b[j][i];
            if (r || g || b)
                for (y=-CELL; y<2*CELL; y++)
                    for (x=-CELL; x<2*CELL; x++)
                        addpixel(i*CELL+x, j*CELL+y, r, g, b, fire_alpha[y+CELL][x+CELL]);
   //addpixel accesses neighbour cells' informations and writes on them
   //and makes UB
            r *= 8;
            g *= 8;
            b *= 8;
            for (y=-1; y<2; y++)
                for (x=-1; x<2; x++)
                    if ((x || y) && i+x>=0 && j+y>=0 && i+x<XRES/CELL && j+y<YRES/CELL)
                    {
                        r += fire_r[j+y][i+x];
                        g += fire_g[j+y][i+x];
                        b += fire_b[j+y][i+x];
                    }
            r /= 16;
            g /= 16;
            b /= 16;
            fire_r[j][i] = r>4 ? r-4 : 0; // UB
            fire_g[j][i] = g>4 ? g-4 : 0; // UB
            fire_b[j][i] = b>4 ? b-4 : 0;
        }

Opencl:

"   int i=get_global_id(0); int j=get_global_id(1);"
"   int VIDXRES="+std::to_string(kkVIDXRES)+";"
                        "   int VIDYRES="+std::to_string(kkVIDYRES)+";"
                        "   int XRES="+std::to_string(kkXRES)+";"
                        "   int CELL="+std::to_string(kkCELL)+";"
                        "   int YRES="+std::to_string(kkYRES)+";"

                        "   int x=0,y=0,r=0,g=0,b=0,nx=0,ny=0;"

                        "       r = fire_r[j*(XRES/CELL)+i];"
                        "       g = fire_g[j*(XRES/CELL)+i];"
                        "       b = fire_b[j*(XRES/CELL)+i];"

                        "       int counterx=0;"
                        "       if (r || g || b)"
                        "       for (y=-CELL; y<2*CELL; y++){"
                        "       for (x=-CELL; x<2*CELL; x++){"
                        "       addpixel(i*CELL+x, j*CELL+y, r, g, b, fire_alpha[(y+CELL)*(3*CELL)+(x+CELL)],vid,vido);"
                        "       }}"

                        "       r *= 8;"
                        "       g *= 8;"
                        "       b *= 8;"
                        "       for (y=-1; y<2; y++){"
                        "       for (x=-1; x<2; x++){"
                        "       if ((x || y) && i+x>=0 && j+y>=0 && i+x<XRES/CELL && j+y<YRES/CELL)"
                        "       {"
                        "           r += fire_r[(j+y)*(XRES/CELL)+(i+x)];"
                        "           g += fire_g[(j+y)*(XRES/CELL)+(i+x)];"
                        "           b += fire_b[(j+y)*(XRES/CELL)+(i+x)];"
                        "       }}}"
                        "       r /= 16;"
                        "       g /= 16;"
                        "       b /= 16;"
                        "       fire_r[j*(XRES/CELL)+i] = (r>4 ? r-4 : 0);"
                        "       fire_g[j*(XRES/CELL)+i] = (g>4 ? g-4 : 0);"
                        "       fire_b[j*(XRES/CELL)+i] = (b>4 ? b-4 : 0);"

Here is picture of some rare artifacts of a 2D NDrangeKernel 's local boundary UB. Can these kill my GPU?

解决方案

On xf86 and xf86_64 architectures it means We dont know if a or b is written to that memory location(as a last action), because load/store operations of 32 (for both) or 64 bit (xf86_64 only) memory aligned datatypes are atomic.

On other architectures usually We dont even know what is written there (a garbage) is a valid answer - for sure on RISC architectures, I currently don't know on GPU's.

Note that The fact the code works doesn't imply that it is correct and in the 99% of the times it's the source of sentences like "there's a compiler bug, the code was working until the previous version" or "the code works on the development machine. The server selected for production is broken" :)

EDIT:

On NVidia GPUs we have weakly-ordered memory model. In the description on the Cuda C Programming guide it's not explicitly stated that store operations are atomic. The write operations come from the same thread, so it does not mean that load/store operations are atomic.

这篇关于可能导致“未定义行为”的原因在此并行GPU代码中？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

可能导致“未定义行为”的原因在此并行GPU代码中？ [英] What might cause "Undefined Behaviour" in this parallel GPU code?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

可能导致“未定义行为”的原因在此并行GPU代码中？ [英] What might cause &quot;Undefined Behaviour&quot; in this parallel GPU code?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

可能导致“未定义行为”的原因在此并行GPU代码中？ [英] What might cause "Undefined Behaviour" in this parallel GPU code?

登录关闭