Cuda优化技术 [英] Cuda optimization techniques

查看:167
本文介绍了Cuda优化技术的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经写了一个CUDA代码来解决NP-Complete问题,但是性能却不是我所怀疑的。

I have written a CUDA code to solve an NP-Complete problem, but the performance was not as I suspected.

我了解一些优化技术(使用共享的内存,纹理,零复制...)

I know about "some" optimization techniques (using shared memroy, textures, zerocopy...)

CUDA程序员应该了解哪些最重要的优化技术?

What are the most important optimization techniques CUDA programmers should know about?

推荐答案

您应该阅读NVIDIA的CUDA编程最佳实践指南: http://developer.download.nvidia.com/compute/cuda /3_0/toolkit/docs/NVIDIA_CUDA_BestPracticesGuide.pdf

You should read NVIDIA's CUDA Programming Best Practices guide: http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_BestPracticesGuide.pdf

这具有多个不同的性能提示以及相关的优先级。以下是一些最重要的优先提示:

This has multiple different performance tips with associated "priorities". Here are some of the top priority tips:


  1. 使用设备的有效带宽来确定性能的上限为您的内核

  2. 最小化主机与设备之间的内存传输-即使这意味着在设备上进行效率低下的计算

  3. 对所有内存进行计算访问

  4. 优先选择共享内存访问,而不是全局内存访问

  5. 避免在单个线程束内执行代码分支,因为这会序列化线程

  1. Use the effective bandwidth of your device to work out what the upper bound on performance ought to be for your kernel
  2. Minimize memory transfers between host and device - even if that means doing calculations on the device which are not efficient there
  3. Coalesce all memory accesses
  4. Prefer shared memory access to global memory access
  5. Avoid code execution branching within a single warp as this serializes the threads

这篇关于Cuda优化技术的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆