直接在主机上访问设备矢量元素的最快方法 [英] Fastest way to access device vector elements directly on host

查看:63
本文介绍了直接在主机上访问设备矢量元素的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我带您转到下一页 http://code.google.com/ p / thrust / wiki / QuickStartGuide#Vectors 。请参阅第二段,其中指出

I refer you to following page http://code.google.com/p/thrust/wiki/QuickStartGuide#Vectors. Please see second paragraph where it says that


还请注意,可以使用标准括号符号访问device_vector的各个元素
。但是,由于每个
访问都需要调用cudaMemcpy,因此应谨慎使用它们。
我们待会儿会看一些更有效的技术。

Also note that individual elements of a device_vector can be accessed using the standard bracket notation. However, because each of these accesses requires a call to cudaMemcpy, they should be used sparingly. We'll look at some more efficient techniques later.

我在整个文档中进行了搜索,但找不到更有效的技术。有谁知道最快的方法吗?即如何最快地访问主机上的设备矢量/设备指针?

I searched all over the document but I could not find the more efficient technique. Does anyone know the fastest way to do this? i.e how to access device vector/device pointer on host fastest?

推荐答案

指南所指的更有效的技术是推力算法。一次访问(或通过PCI-E总线复制)数百万个元素比访问单个元素更有效,因为分摊了CPU / GPU通信的固定成本。

The "more efficient techniques" the guide alludes to are the Thrust algorithms. It's more efficient to access (or copy across the PCI-E bus) millions of elements at once than it is to access a single element because the fixed cost of CPU/GPU communication is amortized.

没有比调用 cudaMemcpy 更快的方法将数据从GPU复制到CPU了,因为对于CUDA程序员而言,这是最原始的方法。任务。

There's no faster way to copy data from the GPU to the CPU than by calling cudaMemcpy, because it is the most primitive way for a CUDA programmer to implement the task.

这篇关于直接在主机上访问设备矢量元素的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆