在设备和主机之间来回复制全局设备上指针地址 [英] Copying global on-device pointer address back and forth between device and host

查看：92 发布时间：2020/5/20 19:03:59 opencl

本文介绍了在设备和主机之间来回复制全局设备上指针地址的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在OpenCL设备(GPU)上创建了一个缓冲区，然后从主机上我需要知道全局设备上指针地址，以便可以将该设备上地址放到另一个缓冲区中，以便内核可以从包含第一个缓冲区地址的缓冲区读取，以便随后可以访问该缓冲区的内容.

I created a buffer on the OpenCL device (a GPU), and from the host I need to know the global on-device pointer address so that I can put that on-device address in another buffer so that the kernel can then read from that buffer that contains the address of the first buffer so that then it can access the contents of that buffer.

如果这令人困惑，这就是我要做的事情:我创建了一个包含浮点数的通用缓冲区，该缓冲区代表2D图像，然后从主机创建一个待办事项列表，其中列出了我的内核需要绘制的所有内容，哪些行，哪个圆圈，哪些图像...因此，内核必须从该列表中知道在哪里可以找到该图像，但是对该图像的引用不能作为内核参数传递，因为该内核可能不会绘制任何图像，也不会绘制一千个不同的图像，具体取决于列表的内容，因此必须在用作我的内核待办事项列表的缓冲区中对其进行引用.

If that's confusing here's what I'm trying to do: I create a generic floats-containing buffer representing a 2D image, then from the host I create a todo list of all the things my kernel needs to draw, which lines, which circles, which images... So from that list the kernel has to know where to find that image, but the reference to that image cannot be passed as a kernel argument, because that kernel might draw no image, or a thousand different images, all depending on what the list says, so it has to be referenced in that buffer that serves as a todo list for my kernel.

为此，我尝试制作一个函数，该函数在创建图像缓冲区后调用内核，该图像缓冲区获取该缓冲区并将全局设备内地址作为ulong返回到另一个缓冲区中，然后主机将该值存储在64位整数，如下所示:

To do so I tried making a function that calls a kernel after the creation of the image buffer that gets the buffer and returns the global on-device address as a ulong in another buffer, then the host stores that value in a 64-bit integer, like this:

uint64_t get_clmem_device_address(clctx_t *clctx, cl_mem buf)
{
    const char kernel_source[] =
"kernel void get_global_ptr_address(global void *ptr, global ulong *devaddr)        \n"
"{                                          \n"
"   *devaddr = (ulong) ptr;                             \n"
"}                                          \n";

    int32_t i;
    cl_int ret;
    static int init=1;
    static cl_program program;
    static cl_kernel kernel;
    size_t global_work_size[1];
    static cl_mem ret_buffer;
    uint64_t devaddr;

    if (init)
    {
        init=0;
        ret = build_cl_program(clctx, &program, kernel_source);
        ret = create_cl_kernel(clctx, program, &kernel, "get_global_ptr_address");
        ret_buffer = clCreateBuffer(clctx->context, CL_MEM_WRITE_ONLY, 1*sizeof(uint64_t), NULL, &ret);
    }
    if (kernel==NULL)
        return ;

    // Run the kernel
    ret = clSetKernelArg(kernel, 0, sizeof(cl_mem), &buf);
    ret = clSetKernelArg(kernel, 1, sizeof(cl_mem), &ret_buffer);

    global_work_size[0] = 1;
    ret = clEnqueueNDRangeKernel(clctx->command_queue, kernel, 1, NULL, global_work_size, NULL, 0, NULL, NULL);         // enqueue the kernel
    ret = clEnqueueReadBuffer(clctx->command_queue, ret_buffer, CL_FALSE, 0, 1*sizeof(uint64_t), &devaddr, 0, NULL, NULL);      // copy the value
    ret = clFlush(clctx->command_queue);
    clFinish(clctx->command_queue);

    return devaddr;
}

显然，这可行(它确实返回一个数字，尽管很难知道它是否正确)，但是随后我将此devaddr(主机上的64位整数)放在内核使用的todo列表缓冲区中要知道要做什么，然后在必要时(根据列表)内核将调用以下函数，其中le这里是指向todo列表中相关条目的指针，而64位地址是第一个元素:

Apparently this works (it does return a number, although it's hard to know if it's correct), but then I put this devaddr (a 64-bit integer on the host) in the todo list buffer that the kernel uses to know what to do, and then if necessary (according to the list) the kernel calls the function below, le here being a pointer to the relevant entry in the todo list, and the 64-bit address being the first element:

float4 blit_sprite(global uint *le, float4 pv)
{
    const int2 p = (int2) (get_global_id(0), get_global_id(1));
    ulong devaddr;
    global float4 *im;
    int2 im_dim;

    devaddr = ((global ulong *) le)[0];     // global address for the start of the image as a ulong
    im_dim.x = le[2];
    im_dim.y = le[3];

    im = (global float4 *) devaddr;     // ulong is turned into a proper global pointer

    if (p.x < im_dim.x)
        if (p.y < im_dim.y)
            pv += im[p.y * im_dim.x + p.x];     // this gives me a CL_OUT_OF_RESOURCES error, even when changing it to im[0]

    return pv;
}

~~，但令我大吃惊的是，它不起作用，它给了我一个CL_OUT_OF_RESOURCES错误，我认为这意味着我的im指针无效.~~实际上，它起作用了，但是当它不起作用时我使用了两种不同的上下文.但这仍然很笨拙.

~~but big surprise this doesn't work, it gives me a CL_OUT_OF_RESOURCES error, which I assume means my im pointer isn't valid.~~ Actually it works, it didn't work when I used two different contexts. But it's still pretty unwieldy.

做我想做的事有没有那么奇怪的方法?

在设备和主机之间来回复制全局设备上指针地址 [英] Copying global on-device pointer address back and forth between device and host

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在设备和主机之间来回复制全局设备上指针地址 [英] Copying global on-device pointer address back and forth between device and host

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭