推进用户编写的内核 [英] Thrust inside user written kernels

查看:16
本文介绍了推进用户编写的内核的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Thrust 的新手.我看到所有 Thrust 演示文稿和示例仅显示主机代码.

I am a newbie to Thrust. I see that all Thrust presentations and examples only show host code.

我想知道是否可以将 device_vector 传递给我自己的内核?如何?如果是,内核/设备代码中允许对其进行哪些操作?

I would like to know if I can pass a device_vector to my own kernel? How? If yes, what are the operations permitted on it inside kernel/device code?

推荐答案

正如最初写的那样,Thrust 纯粹是一个主机端抽象.它不能在内核中使用.您可以将封装在 thrust::device_vector 中的设备内存传递给您自己的内核,如下所示:

As it was originally written, Thrust is purely a host side abstraction. It cannot be used inside kernels. You can pass the device memory encapsulated inside a thrust::device_vector to your own kernel like this:

thrust::device_vector< Foo > fooVector;
// Do something thrust-y with fooVector

Foo* fooArray = thrust::raw_pointer_cast( fooVector.data() );

// Pass raw array and its size to kernel
someKernelCall<<< x, y >>>( fooArray, fooVector.size() );

您还可以在推力算法中使用未由推力分配的设备内存,方法是使用裸 cuda 设备内存指针实例化推力::device_ptr.

and you can also use device memory not allocated by thrust within thrust algorithms by instantiating a thrust::device_ptr with the bare cuda device memory pointer.

四年半后编辑补充说,根据@JackOLantern 的回答,thrust 1.8 添加了顺序执行策略,这意味着您可以在设备上运行thrust 算法的单线程版本.请注意,仍然无法将推力设备向量直接传递给内核,并且设备向量不能直接在设备代码中使用.

Edited four and half years later to add that as per @JackOLantern's answer, thrust 1.8 adds a sequential execution policy which means you can run single threaded versions of thrust's alogrithms on the device. Note that it still isn't possible to directly pass a thrust device vector to a kernel and device vectors can't be directly used in device code.

请注意,在某些情况下,也可以使用 thrust::device 执行策略让内核作为子网格启动并行推力执行.这需要单独的编译/设备链接和支持动态并行的硬件.我不确定所有推力算法是否都支持这一点,但肯定适用于一些算法.

Note that it is also possible to use the thrust::device execution policy in some cases to have parallel thrust execution launched by a kernel as a child grid. This requires separate compilation/device linkage and hardware which supports dynamic parallelism. I am not certain whether this is actually supported in all thrust algorithms or not, but certainly works with some.

这篇关于推进用户编写的内核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆