“-arch sm_13”之间有什么区别和“-arch sm_20” [英] what is difference between "-arch sm_13" and "-arch sm_20"

查看:714
本文介绍了“-arch sm_13”之间有什么区别和“-arch sm_20”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在我的应用程序中进行双精度计算。根据我在google上发现的,我应该添加一个标志-arch sm_13或-arch sm_20。



Q1:-arch sm_13 -arch sm_13和-arch sm_20之间的性能有什么不同吗?




$ b b

我的GPU:GTX 570。



谢谢。

解决方案

SM表示Streaming Multiprocessor,数字表示架构支持的功能。您可以在 CUDA编程指南第3.1.2-3.1.4节中找到一个很好的说明,您可以请参阅附录F中表格中每个架构相关的功能。



NVCC手册(也包含在工具包中):


为了允许架构演变,NVIDIA GPU发行在
不同的世代。新一代引入了
功能和/或芯片架构的主要改进,而同一
代中的GPU模型显示出适度影响
功能,性能或两者的微小配置差异。 p>

您的GPU具有Compute Capability 2.0,因此您应该使用sm_20启用编译器使用旧架构中不可用的功能。如果您想要向后兼容,可以定位sm_13(或sm_1x),查看上面的文档,了解如何使用 -gencode 选项nvcc在单个调用nvcc中定向多个体系结构。



关于性能,需要注意的是sm_1x不支持IEEE754浮点,所以如果你定位sm_13并在具有Compute Capability 2.0或更高版本的设备上运行,则您可能会发现浮点运行速度更快,因为它使用的精度较低的路径。您还可以使用 -ftz = true -prec-div = false -prec-sqrt = false 选项强制使用sm_20或更高版本的不太准确的路径,请参见第5.4节。有关详细信息,请参阅CUDA编程指南中的1。


I need double precision calculation in my application. According what I found on google I should add a flag "-arch sm_13" or "-arch sm_20".

Q1: What is the difference between "-arch sm_13" and "-arch sm_20" ?

Q2: Is there a difference in performance between "-arch sm_13" and "-arch sm_20" ?

My GPU: GTX 570.

Thanks.

解决方案

SM stands for Streaming Multiprocessor and the number indicates the features supported by the architecture. You can find a good description in the CUDA Programming Guide sections 3.1.2-3.1.4 and you can see the features associated with each architecture in the table in appendix F.

From the NVCC manual (also included in the Toolkit):

In order to allow for architectural evolution, NVIDIA GPUs are released in different generations. New generations introduce major improvements in functionality and/or chip architecture, while GPU models within the same generation show minor configuration differences that „moderately‟ affect functionality, performance, or both.

Your GPU has Compute Capability 2.0, so you should use sm_20 to enable the compiler to use features not available in older architectures. If you want backward compatibility, you could also target sm_13 (or sm_1x), check out the documents above for how to use the -gencode option to nvcc to target multiple architectures in a single call to nvcc.

Regarding performance, one thing to look out for is that sm_1x did not support IEEE754 floating point, so if you target sm_13 and run on a device with Compute Capability 2.0 or later then you may find that floating point runs faster since it is using the less accurate path. You can also force the less accurate path with sm_20 or later by using the -ftz=true -prec-div=false -prec-sqrt=false options, see section 5.4.1 in the CUDA Programming Guide for more information on this.

这篇关于“-arch sm_13”之间有什么区别和“-arch sm_20”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆