cudaMallocPitch和cudaMemcpy2D [英] cudaMallocPitch and cudaMemcpy2D

查看：117 发布时间：2020/10/13 1:05:00 c++ cuda

本文介绍了cudaMallocPitch和cudaMemcpy2D的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

将C ++ 2D数组转换为CUDA 1D数组时出现错误。
让我显示我的源代码。

I have an error when transfering C++ 2D array into CUDA 1D array. Let me show my source code.

int main(void)
{
      float h_arr[1024][256];
      float *d_arr;

      // --- Some codes to populate h_arr

      // --- cudaMallocPitch
      size_t pitch;
      cudaMallocPitch((void**)&d_arr, &pitch, 256, 1024);

      // --- Copy array to device
      cudaMemcpy2D(d_arr, pitch, h_arr, 256, 256, 1024, cudaMemcpyHostToDevice);
}

我尝试运行代码，但弹出错误。

I tried to run the code, but it pops up an error.

如何正确使用 cudaMallocPitch（）和 cudaMemcpy2D（） ？

推荐答案

您编写的 cudaMallocPitch 调用看起来还可以，但这：

The cudaMallocPitch call you have written looks ok, but this:

  cudaMemcpy2D(d_arr, pitch, h_arr, 256, 256, 1024, cudaMemcpyHostToDevice);

不正确。引用文档

从src指向的存储区
复制一个矩阵（每行宽度字节的高行）到dst指向的存储区是cudaMemcpyHostToHost，cudaMemcpyHostToDevice，
cudaMemcpyDeviceToHost或cudaMemcpyDeviceToDevice之一的
，并指定副本的
方向。 dpitch和spitch是dst和src指向的2D数组的
字节的内存宽度，包括添加到每行末尾的
填充。内存区域可能不会
重叠。宽度不得超过dpitch或spitch。使用与副本的
方向不匹配的dst和src指针调用
cudaMemcpy2D（）会导致不确定的行为。如果dpitch或spitch超过允许的最大值，则cudaMemcpy2D（）
将返回错误。

Copies a matrix (height rows of width bytes each) from the memory area pointed to by src to the memory area pointed to by dst, where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy. dpitch and spitch are the widths in memory in bytes of the 2D arrays pointed to by dst and src, including any padding added to the end of each row. The memory areas may not overlap. width must not exceed either dpitch or spitch. Calling cudaMemcpy2D() with dst and src pointers that do not match the direction of the copy results in an undefined behavior. cudaMemcpy2D() returns an error if dpitch or spitch exceeds the maximum allowed.

因此要复制的源间距和宽度必须以 bytes 指定。您的主机矩阵的间距为 sizeof（float）* 256 字节，并且由于源间距和要复制的源宽度相同，因此，您的 cudaMemcpy2D 调用应如下所示：

So the source pitch and width to copy must be specified in bytes. Your host matrix has a pitch of sizeof(float) * 256 bytes, and because the source pitch and the width of the source you will copy are the same, this means your cudaMemcpy2Dcall should look like:

 cudaMemcpy2D(d_arr, pitch, h_arr, 256*sizeof(float), 
                256*sizeof(float), 1024, cudaMemcpyHostToDevice);

这篇关于cudaMallocPitch和cudaMemcpy2D的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

cudaMallocPitch和cudaMemcpy2D [英] cudaMallocPitch and cudaMemcpy2D

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

cudaMallocPitch和cudaMemcpy2D [英] cudaMallocPitch and cudaMemcpy2D

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭