Cuda优化技术 [英] Cuda optimization techniques
问题描述
我已经写了一个CUDA代码来解决NP-Complete问题,但是性能却不是我所怀疑的。
I have written a CUDA code to solve an NP-Complete problem, but the performance was not as I suspected.
我了解一些优化技术(使用共享的内存,纹理,零复制...)
I know about "some" optimization techniques (using shared memroy, textures, zerocopy...)
CUDA程序员应该了解哪些最重要的优化技术?
What are the most important optimization techniques CUDA programmers should know about?
推荐答案
您应该阅读NVIDIA的CUDA编程最佳实践指南: http://developer.download.nvidia.com/compute/cuda /3_0/toolkit/docs/NVIDIA_CUDA_BestPracticesGuide.pdf
You should read NVIDIA's CUDA Programming Best Practices guide: http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_BestPracticesGuide.pdf
这具有多个不同的性能提示以及相关的优先级。以下是一些最重要的优先提示:
This has multiple different performance tips with associated "priorities". Here are some of the top priority tips:
- 使用设备的有效带宽来确定性能的上限为您的内核
- 最小化主机与设备之间的内存传输-即使这意味着在设备上进行效率低下的计算
- 对所有内存进行计算访问
- 优先选择共享内存访问,而不是全局内存访问
- 避免在单个线程束内执行代码分支,因为这会序列化线程
- Use the effective bandwidth of your device to work out what the upper bound on performance ought to be for your kernel
- Minimize memory transfers between host and device - even if that means doing calculations on the device which are not efficient there
- Coalesce all memory accesses
- Prefer shared memory access to global memory access
- Avoid code execution branching within a single warp as this serializes the threads
这篇关于Cuda优化技术的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!