openACC传递结构列表 [英] openACC passing a list of struct

查看:186
本文介绍了openACC传递结构列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个C程序来查找2套多边形是否重叠.用户输入2组多边形(每组数据都有数千个多边形),程序将看到set1中的哪个多边形与set2中的哪个多边形重叠

I have a C program to find whether 2 sets of polygons are overlapped. User input 2 sets of polygon (each set of data has several thousands polygons) and the program see which polygon in set1 overlap with which polygon in set2

我有2个这样的结构:

struct gpc_vertex  /* Polygon vertex */
{
    double          x;
    double          y;
};

struct gpc_vertex_list  /* Polygon contour */
{
    int pid;    // polygon id
    int             num_vertices;
    double *mbr;   // minimum bounding rectangle of the polygon, so always 4 elements

};

我有以下代码段:

#pragma acc kernels copy(listOfPolygons1[0:polygonCount1], listOfPolygons2[0:polygonCount2], listOfBoolean[0:dump])
for (i=0; i<polygonCount1; i++){
    polygon1 = listOfPolygons1[i];

    for (j=0; j<polygonCount2; j++){

        polygon2 = listOfPolygons2[j];
        idx = polygonCount2 * i + j;

        listOfBoolean[idx] = isRectOverlap(polygon1.mbr, polygon2.mbr);  // line 115

    }
}

listOfPolygons1和listOfPolygons2是gpc_vertex_list数组(顾名思义). listOfBoolean是一个int数组.
检查2个多边形的mbr以查看它们是否重叠,如果"isRectOverlap"函数返回,则返回1,否则返回0,然后将值放入listOfBoolean

listOfPolygons1 and listOfPolygons2 are (as the name implied) an array of gpc_vertex_list. listOfBoolean is an array of int.
the mbr of the 2 polygons are checked to see if they are overlapped, and the function "isRectOverlap" return 1 if they are, 0 if they are not and put the value to listOfBoolean

问题
该代码可以编译,但不能运行.它返回以下错误:

Problem
The code can compile but not able to run. It returns the following error:

call to cuEventSynchronize returned error 700: Illegal address during kernel execution

我的观察
可以通过将第115行更改为以下内容来编译并运行该程序:

My observation
The program can compile and run by changing line 115 to this:

isRectOverlap(polygon1.mbr, polygon2.mbr); // without assigning value to listOfBoolean

或者这个:

listOfBoolean[idx] = 5; // assigning an arbitrary value

(尽管结果是错误的,但至少可以运行)

(though the result is wrong, but at least, it can run)

问题
如果未将值从"isRectOverlap"传递到"listOfBoolean",则"isRectOverlap"和"listOfBoolean"似乎都不会产生问题.
有没有人知道为什么如果我将返回值从"isRectOverlap"分配给"listOfBoolean",为什么它无法运行?

Question
Both "isRectOverlap" and "listOfBoolean" do not seem to produce the problem if value is not passed from "isRectOverlap" to "listOfBoolean"
Does anyone know why it can't run if I assign the return value from "isRectOverlap" to "listOfBoolean"?

isRectOverlap函数是这样的:

isRectOverlap function is like this:

int isRectOverlap(double *shape1, double *shape2){

    if (shape1[0] > shape2[2] || shape2[0] > shape1[2]){
        return 0;
    }

    if (shape1[1] < shape2[3] || shape2[1] < shape1[3]){
        return 0;
    }

    return 1;

}

当不在OpenACC中运行时,程序没有问题

The program has no problem when not running in OpenACC

感谢您的帮助

推荐答案

在OpenACC数据子句中使用聚合数据类型时,将执行该类型的浅表副本.这里最有可能发生的事情是,当将listOfPolygons数组复制到设备上时,"mbr"将包含主机地址.因此,当访问"mbr"时,程序将给出非法的地址错误.

When aggregate data types are used in an OpenACC data clause, a shallow copy of the type is performed. What's most likely happening here is that when the listOfPolygons arrays are copied to the device, "mbr" will contain host addresses. Hence, the program will give an illegal address error when a "mbr" is accessed.

鉴于评论说"mbr"将始终为4,最简单的方法是将"mbr"设置为大小为4的固定大小的数组.

Given the comment says that "mbr" will always be 4, the simplest thing to do is make "mbr" a fixed size array of size 4.

假设您在NVIDIA设备上使用PGI编译器,第二种方法是通过编译"-ta = tesla:managed"使用CUDA统一内存.所有动态内存将由CUDA运行时处理,并允许在设备上访问主机地址.需要注意的是,它仅可用于动态数据,您的整个程序只能使用设备上可用的内存,这可能会减慢您的程序的速度. http://www.pgroup.com/lit/articles/insider/v6n2a4.htm

Assuming you're using PGI compilers with an NVIDIA device, a second method is to use CUDA Unified Memory by compiling "-ta=tesla:managed". All dynamic memory would be handled by the CUDA runtime and allow host addresses to be accessed on the device. The caveats being that it's only available for dynamic data, your whole program can only use as much memory as available on the device, and it may slow down your program. http://www.pgroup.com/lit/articles/insider/v6n2a4.htm

第三个选项是将聚合类型的深层副本复制到设备.如果您决定走这条路线,我可以举个例子.在GTC2015上的演讲中,我还谈到了该主题: https://www.youtube .com/watch?v = rWLmZt_u5u4

A third option is to perform a deep copy of the aggregate type to the device. I can post an example if you decide to go this route. I also talk about the subject as part of a presentation I did at GTC2015: https://www.youtube.com/watch?v=rWLmZt_u5u4

这篇关于openACC传递结构列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆