CUDA驱动程序API与CUDA运行时 [英] CUDA Driver API vs. CUDA runtime

查看:216
本文介绍了CUDA驱动程序API与CUDA运行时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编写CUDA应用程序时,您可以在驱动程序级别或运行时级别工作,如图所示(库为CUFFT和CUBLAS高级数学运算):





我假设两者之间的折衷是提高低级API的性能,但代价是增加代码的复杂性。什么是具体的差异,有什么重要的事情,你不能做的高级API?



我使用CUDA.net与C#互操作,它是构建为驱动程序API的副本。这鼓励在C#中编写大量相当复杂的代码,而使用运行时API的C ++等价体将更简单。这样做有什么要赢吗?我可以看到的一个好处是,更容易集成智能错误处理与其余的C#代码。

解决方案

CUDA运行时可以编译和链接您的CUDA内核到可执行文件。这意味着您不必在您的应用程序中分发cubin文件,或处理通过驱动程序API加载它们。正如你所注意到的,它通常更容易使用。



相比之下,驱动程序API更难以编程,但提供了对CUDA如何使用的更多控制。程序员必须直接处理初始化,模块加载等。



显然,更详细的设备信息可以通过驱动程序API查询,而不是通过运行时API查询。例如,设备上可用的空闲内存只能通过驱动程序API查询。



从CUDA程序员指南:


它由两个API组成:




  • 一个称为CUDA驱动程序API的低级API,




这些API是互斥的:应用程序应该使用一个或
其他。



CUDA运行时通过提供隐式
初始化,上下文管理和模块管理来简化设备代码管理。 nvcc生成的C主机代码
基于CUDA运行时(参见第4.2.5节),因此链接到此代码的
应用程序必须使用CUDA运行时API。

$ b $相比之下,CUDA驱动程序API需要更多的代码,更难于编程和
调试,但提供了更好的控制水平,并且是独立于语言的,因为它只有
与cubin对象(见第4.2.5节)。特别是,使用CUDA驱动程序API配置和启动内核更加困难,因为必须使用显式函数调用
指定执行
配置和内核参数,而不是执行配置语法如第4.2.3节所述。此外,设备
模拟(参见4.5.2.9节)不适用于CUDA驱动程序API。


显着的API之间的性能差异。你的内核如何使用内存,以及如何在GPU上布局(在warp和blocks中)会有更明显的效果。


When writing CUDA applications, you can either work at the driver level or at the runtime level as illustrated on this image (The libraries are CUFFT and CUBLAS for advanced math):

I assume the tradeoff between the two are increased performance for the low-evel API but at the cost of increased complexity of code. What are the concrete differences and are there any significant things which you cannot do with the high-level API?

I am using CUDA.net for interop with C# and it is built as a copy of the driver API. This encourages writing a lot of rather complex code in C# while the C++ equivalent would be more simple using the runtime API. Is there anything to win by doing it this way? The one benefit I can see is that it is easier to integrate intelligent error handling with the rest of the C# code.

解决方案

The CUDA runtime makes it possible to compile and link your CUDA kernels into executables. This means that you don't have to distribute cubin files with your application, or deal with loading them through the driver API. As you have noted, it is generally easier to use.

In contrast, the driver API is harder to program but provided more control over how CUDA is used. The programmer has to directly deal with initialization, module loading, etc.

Apparently more detailed device information can be queried through the driver API than through the runtime API. For instance, the free memory available on the device can be queried only through the driver API.

From the CUDA Programmer's Guide:

It is composed of two APIs:

  • A low-level API called the CUDA driver API,
  • A higher-level API called the CUDA runtime API that is implemented on top of the CUDA driver API.

These APIs are mutually exclusive: An application should use either one or the other.

The CUDA runtime eases device code management by providing implicit initialization, context management, and module management. The C host code generated by nvcc is based on the CUDA runtime (see Section 4.2.5), so applications that link to this code must use the CUDA runtime API.

In contrast, the CUDA driver API requires more code, is harder to program and debug, but offers a better level of control and is language-independent since it only deals with cubin objects (see Section 4.2.5). In particular, it is more difficult to configure and launch kernels using the CUDA driver API, since the execution configuration and kernel parameters must be specified with explicit function calls instead of the execution configuration syntax described in Section 4.2.3. Also, device emulation (see Section 4.5.2.9) does not work with the CUDA driver API.

There is no noticeable performance difference between the API's. How your kernels use memory and how they are laid out on the GPU (in warps and blocks) will have a much more pronounced effect.

这篇关于CUDA驱动程序API与CUDA运行时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆