CUDA - 实现设备散列映射? [英] CUDA - Implementing Device Hash Map?

查看:221
本文介绍了CUDA - 实现设备散列映射?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有人在CUDA设备上实现哈希映射的任何经验?具体来说,我想知道如何在设备上分配内存并将结果复制回主机,或者是否有任何有用的库可以方便这个任务。



看起来我需要先知道哈希映射的最大大小,以分配设备内存。所有我以前的CUDA努力使用数组和memcpys,因此是相当简单。



任何洞察这个问题是值得赞赏。谢谢。

解决方案

有来自Jason Sanders和Edward Kandrot的CUDA by example提出的GPU哈希表实现。



幸运的是,您可以在此页面上获取有关此书的信息和自由下载示例源代码

< a href =http://developer.nvidia.com/object/cuda-by-example.html> http://developer.nvidia.com/object/cuda-by-example.html

在此实现中,表在CPU上预分配,并且通过基于原子函数atomicCAS(比较和交换)的锁定功能来确保安全的多线程访问。



此外,更新的硬件生成(从2.0)结合CUDA> = 4.0应该能够在GPU上直接使用新的/删除操作符( http://developer.nvidia.com/object/cuda_4_0_RC_downloads.html?utm_source=http://forums.nvidia.com&utm_medium=http://forums.nvidia.com&utm_term=Developers&utm_content=Developers&utm_campaign = CUDA4 ),它可以为您的实现提供服务。我尚未测试这些功能。


Does anyone have any experience implementing a hash map on a CUDA Device? Specifically, I'm wondering how one might go about allocating memory on the Device and copying the result back to the Host, or whether there are any useful libraries that can facilitate this task.

It seems like I would need to know the maximum size of the hash map a priori in order to allocate Device memory. All my previous CUDA endeavors have used arrays and memcpys and therefore been fairly straightforward.

Any insight into this problem are appreciated. Thanks.

解决方案

There is a GPU Hash Table implementation presented in "CUDA by example", from Jason Sanders and Edward Kandrot.

Fortunately, you can get information on this book and download the examples source code freely on this page:
http://developer.nvidia.com/object/cuda-by-example.html

In this implementation, the table is pre-allocated on CPU and safe multithreaded access is ensured by a lock function based upon the atomic function atomicCAS (Compare And Swap).

Moreover, newer hardware generation (from 2.0) combined with CUDA >= 4.0 are supposed to be able to use directly new/delete operators on the GPU ( http://developer.nvidia.com/object/cuda_4_0_RC_downloads.html?utm_source=http://forums.nvidia.com&utm_medium=http://forums.nvidia.com&utm_term=Developers&utm_content=Developers&utm_campaign=CUDA4 ), which could serve your implementation. I haven't tested these features yet.

这篇关于CUDA - 实现设备散列映射?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆