在CUDA9中,“ cudaMemcpyAsync()”是设备和主机功能? [英] In CUDA9, is "cudaMemcpyAsync()" both a device and a host function?

查看:176
本文介绍了在CUDA9中,“ cudaMemcpyAsync()”是设备和主机功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据官方CUDA,

According to the official CUDA doc, we have

__host__ ​ __device__ ​cudaError_t cudaMemcpyAsync ( void* dst, const void* src, size_t count, cudaMemcpyKind kind, cudaStream_t stream = 0 )

这意味着它既是主机功能也是设备功能。但是,在本地Linux机器上的实际安装中,我在 /usr/local/cuda/include/cuda_runtime_api.h 中看到了:

which implies it is both a host and a device function. However, in the actual installation on my local Linux box, I am seeing in /usr/local/cuda/include/cuda_runtime_api.h:

/** CUDA Runtime API Version */
#define CUDART_VERSION  9000
// Many lines away...
extern __host__ __cudart_builtin__ cudaError_t CUDARTAPI cudaMemcpyAsync(void *dst, const void *src, size_t count, enum cudaMemcpyKind kind, cudaStream_t stream __dv(0));

这似乎暗示它严格是宿主函数。

which seems to imply it is strictly a host function.

我尝试编译一个简单的内核,该内核调用 cudaMemcpyAsync(),并收到错误消息

I tried to compile a simple kernel that calls cudaMemcpyAsync(), and got the error


streaming.cu(338):错误:从__global__
函数( loopy_plus_one)调用__host__
函数( cudaMemcpyAsync) )

streaming.cu(338): error: calling a __host__ function("cudaMemcpyAsync") from a __global__ function("loopy_plus_one") is not allowed

这是另一种证据。

所以我真的很困惑:文档不正确,还是我的CUDA安装已过期?

So I'm really confused: is the doc incorrect, or is my CUDA installation out of date?

编辑:更新-如果我更改编译命令以明确指定sm_60 ,即 nvcc -arch = sm_60 -o out ./src.cu ,然后编译错误消失了,但是弹出了一个新错误:

update - if I change my compilation command to explicitly specify sm_60, i.e., nvcc -arch=sm_60 -o out ./src.cu, then the compilation error is gone, but a new one pops out:


ptxas致命的:未解决的外部函数'cudaMemcpyAsync'

ptxas fatal : Unresolved extern function 'cudaMemcpyAsync'


推荐答案

CUDA device 运行时API中有一个 cudaMemcpyAsync 的设备实现,您可以在Programmi中看到该实现。 ng指南此处。在那里,在动态并行性的简介部分中,它注释

There is a device implementation of cudaMemcpyAsync in the CUDA device runtime API, which you can see documented in the Programming Guide here. There, within the introductory section on Dynamic Parallelism it notes


只有计算能力为
3.5或更高的设备才支持动态并行性。

Dynamic Parallelism is only supported by devices of compute capability 3.5 and higher

,并且在文档中还记录了设备运行时API内存功能的用法:

and within the documentation it also notes usage of the device runtime API memory functions:


关于所有memcpy / memset函数的注释:

Notes about all memcpy/memset functions:


  • 仅支持异步memcpy / set函数

  • 仅允许使用设备到设备的memcpy

  • 不得传递本地或共享内存指针

您还可以找到确切的指令,了解如何编译和链接使用开发人员的代码ice运行时API:

You can also find exact instructions for how you must compile and link code which uses the device runtime API:


使用nvcc编译时,CUDA程序会自动与主机运行时库
链接,但是设备运行时是作为静态
库提供,必须与希望
使用的程序明确链接。

CUDA programs are automatically linked with the host runtime library when compiled with nvcc, but the device runtime is shipped as a static library which must explicitly be linked with a program which wishes to use it.

设备运行时以静态方式提供库(在
Windows上为cudadevrt.lib,在Linux和MacOS下为libcudadevrt.a),必须将使用设备运行时的GPU
应用程序链接到该库。
设备库的链接可以通过nvcc和/或nvlink来完成。

The device runtime is offered as a static library (cudadevrt.lib on Windows, libcudadevrt.a under Linux and MacOS), against which a GPU application that uses the device runtime must be linked. Linking of device libraries can be accomplished through nvcc and/or nvlink.

因此,要使这项工作有效,您必须完全做到三件事:

So to make this work you must do exactly three things:


  1. 选择物理目标体系结构,编译时至少应具有3.5的计算能力

  2. 编译时对设备代码使用单独的编译

  3. 链接CUDA设备运行时库

  1. Choose a physical target architecture which is at least compute capability 3.5 when you are compiling
  2. Use separate compilation for device code when you are compiling
  3. Link the CUDA device runtime library

由于这三个原因(即不执行任何一个操作),您在尝试使用 cudaMemcpyAsync 时看到了编译和链接错误。内部内核代码。

It is for these three reasons (i.e. not doing any of them) that you have seen the compilation and linking errors when trying to use cudaMemcpyAsync inside kernel code.

这篇关于在CUDA9中,“ cudaMemcpyAsync()”是设备和主机功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆