无效的参数错误时,从设备将数据复制到主机 [英] Invalid Argument error when copying data from device to host
问题描述
我有我的设备的数据复制回主机的问题。我的数据布置在一个结构:
I am having problems copying data from my device back to the host. My data are arranged in a struct:
typedef struct Array2D {
double* arr;
int rows;
int cols;
} Array2D;
改编
是一个'平'的数组。 行
和 COLS
描述的尺寸。
arr
is a 'flat' array. rows
and cols
describes the dimensions.
在code以下显示了如何在尝试将数据复制回主机:
The code below shows how I am trying to copy the data back to the host:
h_output = (Array2D*) malloc(sizeof(Array2D));
cudaMemcpy(h_output, d_output, sizeof(Array2D), cudaMemcpyDeviceToHost);
double* h_arr = (double*) malloc(h_output->cols*h_output->rows*sizeof(double));
cudaMemcpy(h_arr, h_output->arr, h_output->cols*h_output->rows*sizeof(double), cudaMemcpyDeviceToHost);
h_output->arr = h_arr;
然而,在第四行的执行失败,CUDA错误11(无效参数)。我不明白为什么会这样。数组的大小是正确的,我可以同时访问 h_output
和 h_array
从主机且均具有真实地址。
However, in the fourth line the execution fails with cuda error 11 (invalid argument). I cannot see why this is happening. The size of the array is correct, and I can access both h_output
and h_array
from the host and both have 'real' addresses.
的 修改的
对不起,该请求反应迟缓的详细信息(=更code)。
EDIT Sorry for the late response to the request for more information (= more code).
我测试过该指针 d_output->改编
是一个设备指针,试图访问该主机上的设备指针的值。正如预期的那样,我是不允许这样做,让我本以为 d_output-方式> ARR
其实是一个有效的设备指针
I have tested that the pointer d_output->arr
is a device pointer, by trying to access the value of the device pointer on the host. As expected, I was not allowed to do that leaving me with the thought that d_output->arr
is in fact a valid device pointer.
在code的目标是利用四阶龙格 - 库塔方法求解蒂勒的微分方程。
The code's objective is to solve Thiele's differential equation using the fourth order Runge-Kutta method.
class CalculationSpecification
{
/* FUNCTIONS OMITTED */
public:
__device__ void RK4_n(CalculationSpecification* cs, CalcData data, Array2D* d_output)
{
double* rk4data = (double*)malloc((data.pdata->endYear - data.pdata->startYear + 1)*data.pdata->states*sizeof(double));
/* CALCULATION STUFF HAPPENS HERE */
// We know that rows = 51, cols = 1 and that rk4data contains 51 values as it should.
// This was confirmed by using printf directly in this function.
d_output->arr = rk4data;
d_output->rows = data.pdata->endYear - data.pdata->startYear + 1;
d_output->cols = data.pdata->states;
}
};
class PureEndowment : CalculationSpecification
{
/* FUNCTIONS OMITTED */
public:
__device__ void Compute(Array2D *result, CalcData data)
{
RK4_n(this, data, result);
}
};
__global__ void kernel2(Array2D *d_output)
{
/* Other code that initializes 'cd'. */
PureEndowment pe;
pe.Compute(d_output,cd);
}
void prepareOutputSet(Array2D* h_output, Array2D* d_output, int count)
{
h_output = (Array2D*) malloc(sizeof(Array2D));
cudaMemcpy(h_output, d_output, sizeof(Array2D), cudaMemcpyDeviceToHost); // After this call I can read the correct values of row, col as well as the address of the pointer.
double* h_arr = (double*) malloc(h_output->cols*h_output->rows*sizeof(double));
cudaMemcpy(h_arr, h_output->arr, h_output->cols*h_output->rows*sizeof(double), cudaMemcpyDeviceToHost)
h_output->arr = h_arr;
}
int main()
{
Array2D *h_output, *d_output;
cudaMalloc((void**)&d_output, sizeof(Array2D));
kernel2<<<1,1>>>(d_output);
cudaDeviceSynchronize();
prepareOutputSet(h_output, d_output, 1);
getchar();
return 0;
}
EDIT2 的
EDIT2
此外,我现在已经测试过的值 d_output-&GT;改编
在设备上运行相同的值时, h_output-方式&gt; ARR
后的第一个 cudaMemcpy
-call在 prepareOutputSet
Additionally, I have now tested that the value of d_output->arr
when running on the device is identical to the value of h_output->arr
after the first cudaMemcpy
-call in prepareOutputSet
.
推荐答案
这(使用cudaMemcpy复制设备分配的内存)是CUDA 4.1已知限制。一个修复程序的工程,并在CUDA运行时的未来版本发布。
This (copying device-allocated memory using cudaMemcpy) is a known limitation in CUDA 4.1. A fix is in the works and will be released in a future version of the CUDA runtime.
这篇关于无效的参数错误时,从设备将数据复制到主机的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!