将数据从设备复制到主机时出现无效参数错误 [英] Invalid Argument error when copying data from device to host

查看:44
本文介绍了将数据从设备复制到主机时出现无效参数错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在将数据从我的设备复制回主机时遇到问题.我的数据排列在一个结构中:

I am having problems copying data from my device back to the host. My data are arranged in a struct:

typedef struct Array2D {
    double* arr;        
    int rows;       
    int cols;       
} Array2D;

arr 是一个平面"数组.rowscols 描述维度.

arr is a 'flat' array. rows and cols describes the dimensions.

下面的代码显示了我如何尝试将数据复制回主机:

The code below shows how I am trying to copy the data back to the host:

h_output = (Array2D*) malloc(sizeof(Array2D));
cudaMemcpy(h_output, d_output, sizeof(Array2D), cudaMemcpyDeviceToHost);
double* h_arr = (double*) malloc(h_output->cols*h_output->rows*sizeof(double));
cudaMemcpy(h_arr, h_output->arr, h_output->cols*h_output->rows*sizeof(double), cudaMemcpyDeviceToHost);
h_output->arr = h_arr;

但是,在第四行中,执行失败并出现 cuda 错误 11(无效参数).我不明白为什么会这样.数组的大小是正确的,我可以从主机访问 h_outputh_array 并且两者都有真实"地址.

However, in the fourth line the execution fails with cuda error 11 (invalid argument). I cannot see why this is happening. The size of the array is correct, and I can access both h_output and h_array from the host and both have 'real' addresses.

编辑抱歉,对更多信息(= 更多代码)请求的回复晚了.

EDIT Sorry for the late response to the request for more information (= more code).

我已经通过尝试访问主机上设备指针的值来测试指针 d_output->arr 是设备指针.正如预期的那样,我不被允许这样做,让我认为 d_output->arr 实际上是一个有效的设备指针.

I have tested that the pointer d_output->arr is a device pointer, by trying to access the value of the device pointer on the host. As expected, I was not allowed to do that leaving me with the thought that d_output->arr is in fact a valid device pointer.

代码的目标是使用四阶 Runge-Kutta 方法求解 Thiele 微分方程.

The code's objective is to solve Thiele's differential equation using the fourth order Runge-Kutta method.

class CalculationSpecification
{

    /* FUNCTIONS OMITTED */

public:
    __device__ void RK4_n(CalculationSpecification* cs, CalcData data, Array2D* d_output)
    {
        double* rk4data = (double*)malloc((data.pdata->endYear - data.pdata->startYear + 1)*data.pdata->states*sizeof(double));

        /* CALCULATION STUFF HAPPENS HERE */

        // We know that rows = 51, cols = 1 and that rk4data contains 51 values as it should.
        // This was confirmed by using printf directly in this function.
        d_output->arr = rk4data;
        d_output->rows = data.pdata->endYear - data.pdata->startYear + 1;
        d_output->cols = data.pdata->states;
    }
};


class PureEndowment : CalculationSpecification
{
    /* FUNCTIONS OMITTED */

public:
    __device__ void Compute(Array2D *result, CalcData data)
    {
        RK4_n(this, data, result);
    }
};


__global__ void kernel2(Array2D *d_output)
{
    /* Other code that initializes 'cd'. */
    PureEndowment pe;
    pe.Compute(d_output,cd);
}


void prepareOutputSet(Array2D* h_output, Array2D* d_output, int count)
{
    h_output = (Array2D*) malloc(sizeof(Array2D));
    cudaMemcpy(h_output, d_output, sizeof(Array2D), cudaMemcpyDeviceToHost); // After this call I can read the correct values of row, col as well as the address of the pointer.
    double* h_arr = (double*) malloc(h_output->cols*h_output->rows*sizeof(double));
    cudaMemcpy(h_arr, h_output->arr, h_output->cols*h_output->rows*sizeof(double), cudaMemcpyDeviceToHost)
    h_output->arr = h_arr;
}

int main()
{
    Array2D *h_output, *d_output;
    cudaMalloc((void**)&d_output, sizeof(Array2D));

    kernel2<<<1,1>>>(d_output);
    cudaDeviceSynchronize();

    prepareOutputSet(h_output, d_output, 1);

    getchar();
    return 0;
}

EDIT2

另外,我现在测试了 d_output->arr 在设备上运行时的值与 h_output->arr 之后的值相同第一个 cudaMemcpy-在 prepareOutputSet 中调用.

Additionally, I have now tested that the value of d_output->arr when running on the device is identical to the value of h_output->arr after the first cudaMemcpy-call in prepareOutputSet.

推荐答案

这(使用 cudaMemcpy 复制设备分配的内存)是 CUDA 4.1 中的一个已知限制.修复工作正在进行中,将在 CUDA 运行时的未来版本中发布.

This (copying device-allocated memory using cudaMemcpy) is a known limitation in CUDA 4.1. A fix is in the works and will be released in a future version of the CUDA runtime.

这篇关于将数据从设备复制到主机时出现无效参数错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆