如何运行“主机"使用 CUDA 在 GPU 上的功能? [英] How to run "host" functions on GPU with CUDA?

查看:11
本文介绍了如何运行“主机"使用 CUDA 在 GPU 上的功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将在 GPU 上运行例如 strcmp 函数,但我得到:

I'm going to run on GPU for example a strcmp function, but I get:

error: calling a host function("strcmp") from a __device__/__global__ function("myKernel") is not allowed

printf 可能无法工作,因为 gpu 没有标准输出,但像 strcmp 这样的函数应该可以工作!所以,我应该在我的代码中插入带有 __device__ 前缀的库中的 strcmp 的实现还是什么?

It's possible that printf won't work because gpu hasn't got stdout, but functions like strcmp are expected to work! So, I should insert in my code the implement of strcmp from the library with __device__ prefix or what?

推荐答案

CUDA 有一个标准库,记录在 CUDA 编程指南中.它包括支持它的设备的 printf()(Compute Capability 2.0 和更高版本),以及 assert().但是,此时它不包括完整的字符串或 stdio 库.

CUDA has a standard library, documented in the CUDA programming guide. It includes printf() for devices that support it (Compute Capability 2.0 and higher), as well as assert(). It does not include a complete string or stdio library at this point, however.

按照 Jason R. Mick 的建议实现您自己的标准库是可能的,但不一定可取.在某些情况下,天真地将函数从顺序标准库移植到 CUDA 可能是不安全的——尤其是因为其中一些实现并不意味着是线程安全的(例如 Windows 上的 rand() ).即使它是安全的,它也可能效率不高——而且它可能不是你真正需要的.

Implementing your own standard library as Jason R. Mick suggests may be possible, but it is not necessarily advisable. In some cases, it may be unsafe to naively port functions from the sequential standard library to CUDA -- not least because some of these implementations are not meant to be thread safe (rand() on Windows, for example). Even if it is safe, it might not be efficient -- and it might not really be what you need.

在我看来,你最好避免在 CUDA 中没有官方支持的标准库函数.如果您在并行代码中需要标准库函数的行为,请首先考虑您是否真的需要它:* 你真的要并行执行数千个 strcmp 操作吗?* 如果没有,您是否有数千个字符长的字符串要比较?如果是这样,请考虑使用并行字符串比较算法.

In my opinion, you are better off avoiding standard library functions in CUDA that are not officially supported. If you need the behavior of a standard library function in your parallel code, first consider whether you really need it: * Are you really going to do thousands of strcmp operations in parallel? * If not, do you have strings to compare that are many thousands of characters long? If so, consider a parallel string comparison algorithm instead.

如果您确定在并行 CUDA 代码中确实需要标准库函数的行为,那么请考虑如何(安全有效地)并行实现它.

If you determine that you really do need the behavior of the standard library function in your parallel CUDA code, then consider how you might implement it (safely and efficiently) in parallel.

这篇关于如何运行“主机"使用 CUDA 在 GPU 上的功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆