使用CUDA中的统一内存分配功能指针 [英] Assignment of function pointer with the unified memory in CUDA

查看:118
本文介绍了使用CUDA中的统一内存分配功能指针的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在方便的统一内存模型下使用CUDA实现功能的动态绑定。在这里,我们有一个结构 Parameters ,其中包含一个成员,一个函数指针void(* p_func)()。

I am trying to implement the dynamic binding of functions with CUDA under the convenient unified memory model. Here, we have a struct Parameters containing a member, a function pointer void (*p_func)().

#include <cstdio>

struct Parameters {
    void (*p_func)();
};

该结构由统一内存管理,我们为实际功能分配 func_A p_func

The struct is managed by the unified memory and we assign the actual function func_A to p_func.

__host__ __device__
void func_A() {
    printf("func_A is correctly invoked!\n");
    return;
}

当我们通过下面的代码时,就会出现问题:如果运行分配1 ,即 para-> p_func = func_A ,设备和主机功能地址实际上都是由主机上的功能地址分配的。相比之下,如果运行分配2,则地址都将成为设备1。

When we go through the following code, the problem arises: if assignment 1 runs, i.e., para->p_func = func_A, both device and host function addresses are actually assigned by the function address at the host. In the contrast, if assignment 2 runs, the addresses both become the device one.

__global__ void assign_func_pointer(Parameters* para) {
    para->p_func = func_A;
}

__global__ void run_on_device(Parameters* para) {
    printf("run on device with address %p\n", para->p_func);
    para->p_func();
}

void run_on_host(Parameters* para) {
    printf("run on host with address %p\n", para->p_func);
    para->p_func();
}

int main(int argc, char* argv[]) {

    Parameters* para;
    cudaMallocManaged(&para, sizeof(Parameters));

    // assignment 1, if we uncomment this section, p_func points to address at host
    para->p_func = func_A;
    printf("addr@host: %p\n", para->p_func);

    // assignment 2, if we uncomment this section, p_func points to address at device
    assign_func_pointer<<<1,1>>>(para); // 
    cudaDeviceSynchronize();
    printf("addr@device: %p\n", para->p_func);

    run_on_device<<<1,1>>>(para);
    cudaDeviceSynchronize();

    run_on_host(para);

    cudaFree(para);
    return 0;
}

现在的问题是,两个设备上的函数指针是否可能

The question now is, is it possible for the function pointers at both the device and host point to the correct function addresses, respectively, under the unified memory model?

推荐答案

在统一内存模型下,主机和主机分别指向正确的函数地址吗? $ c> struct 定义,可能是这样的:

With some modifications to the struct definition, something like this may be possible:

$ cat t1288.cu
#include <cstdio>

struct Parameters {
    void (*p_hfunc)();
    void (*p_dfunc)();
    __host__ __device__
    void p_func(){
      #ifdef __CUDA_ARCH__
      (*p_dfunc)();
      #else
      (*p_hfunc)();
      #endif
      }
};

__host__ __device__
void func_A() {
    printf("func_A is correctly invoked!\n");
    return;
}

__global__ void assign_func_pointer(Parameters* para) {
    para->p_dfunc = func_A;
}

__global__ void run_on_device(Parameters* para) {
    printf("run on device\n"); // with address %p\n", para->p_dfunc);
    para->p_func();
}

void run_on_host(Parameters* para) {
    printf("run on host\n"); // with address %p\n", para->p_func);
    para->p_func();
}

int main(int argc, char* argv[]) {

    Parameters* para;
    cudaMallocManaged(&para, sizeof(Parameters));

    // assignment 1, if we uncomment this section, p_func points to address at host
    para->p_hfunc = func_A;
    printf("addr@host: %p\n", para->p_hfunc);

    // assignment 2, if we uncomment this section, p_func points to address at device
    assign_func_pointer<<<1,1>>>(para); //
    cudaDeviceSynchronize();
    printf("addr@device: %p\n", para->p_dfunc);

    run_on_device<<<1,1>>>(para);
    cudaDeviceSynchronize();
    run_on_host(para);

    cudaFree(para);
    return 0;
}
$ nvcc -arch=sm_35 -o t1288 t1288.cu
$ cuda-memcheck ./t1288
========= CUDA-MEMCHECK
addr@host: 0x402add
addr@device: 0x8
run on device
func_A is correctly invoked!
run on host
func_A is correctly invoked!
========= ERROR SUMMARY: 0 errors
$

我同意另一个答案,即即使使用托管内存,目前也无法使用单个数值函数指针在主机代码和设备代码中均能正常工作。

I concur with the other answer that it is currently not possible even with managed memory, to have a single numerical function pointer that works correctly both in host code and device code.

这篇关于使用CUDA中的统一内存分配功能指针的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆