如何确定我的GPU是否执行16/32/64位算术运算? [英] How to determine if my GPU does 16/32/64 bit arithmetic operations?

查看：225 发布时间：2020/4/29 3:35:09 c++ cuda nvidia latency

本文介绍了如何确定我的GPU是否执行16/32/64位算术运算?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在Nvidia卡上查找本机算术运算的吞吐量.在此页面上， Nvidia已记录了各种算术运算的吞吐量值.问题是如何确定我的卡是否进行16或32或64位操作，因为每个值都不相同? 此外，我还想为我的卡计算这些指令的等待时间值.有什么办法吗?就我的研究而言，它们没有像吞吐量那样得到记录.为此有一些基准套件吗?

I am trying find the throughput of native arithmetic operations on my Nvidia card. On this page, Nvidia have documented the throughput values for various arithmetic operations. The problem is how do I determine if my card does 16 or 32 or 64 bit operations, since the values are different for each? Further, I also want to calculate the latency values of these instructions for my card. Is there some way to do it? As far as my research goes, they are not documented like throughput. Is there some benchmark suite for this purpose?

谢谢！

推荐答案

由于每个值都不相同，如何确定我的卡是否进行16位，32位或64位操作?

how do I determine if my card does 16 or 32 or 64 bit operations, since the values are different for each?

在页面上您已链接，列出了表格顶部(每列)的计算功能.您的GPU具有计算能力.您可以使用deviceQuery cuda示例应用程序找出它的含义，或者在此处进行查找.

On the page you linked, is listed compute capabilities across the top of the table (for each column). Your GPU has a compute capability. You can use the deviceQuery cuda sample app to figure out what it is, or look it up here.

例如，假设我有一个GTX 1060 GPU.如果在其上运行deviceQuery，将报告计算能力的主版本6和次版本1，因此它是计算能力6.1 GPU.您还可以在此处看到.

For example, suppose I had a GTX 1060 GPU. If you run deviceQuery on it, will report a compute capability major version of 6 and a minor version of 1, so it is a compute capability 6.1 GPU. You can also see that here.

现在，回到链接的表，这意味着标记为6.1的列是您感兴趣的列之一.看起来像这样:

Now, going back to the table you linked, that means the column labelled 6.1 is the one of interest. It looks like this:

                                            Compute Capability
                                                    6.1 
16-bit floating-point add, multiply, multiply-add   2     ops/SM/clock
32-bit floating-point add, multiply, multiply-add   128   ops/SM/clock
64-bit floating-point add, multiply, multiply-add   4     ops/SM/clock
...

这意味着GTX 1060能够以3种不同的精度(16位，32位，64位)以不同的速率或速率执行所有3种类型的操作(浮点乘法，乘加或加).每种精度的吞吐量.关于表格，这些数字是每个时钟和每个SM .

This means a GTX 1060 is capable of all 3 types of operations (floating point multiply, or multiply-add, or add) at 3 different precisions (16-bit, 32-bit, 64-bit) at differing rates or throughputs for each precision. With respect to the table, these numbers are per clock and per SM.

为了确定整个GPU的合计峰值理论吞吐量，我们必须将上述数字乘以GPU的时钟速率以及GPU中的SM(流式多处理器)数量. CUDA deviceQuery应用程序还可以告诉您此信息，也可以在线查找.

In order to determine the aggregate peak theoretical throughput for the entire GPU, We must multiply the above numbers by the clock rate of the GPU and by the number of SMs (streaming multiprocessors) in the GPU. The CUDA deviceQuery app can also tell you this information, or you can look it up on line.

此外，我还想为我的卡计算这些指令的延迟值.有什么办法吗?就我的研究而言，它们没有像吞吐量那样得到记录.

Further, I also want to calculate the latency values of these instructions for my card. Is there some way to do it? As far as my research goes, they are not documented like throughput.

正如我在您的上一个问题中提到的那样，这些延迟值未发布或指定，实际上它们可能(也确实)在GPU之间变化，从一种指令类型改变为另一种(例如，浮点乘法和浮点加法可能具有不同的延迟)，甚至可能从CUDA版本改变CUDA版本，对于某些操作类型，这些操作类型是通过多个SASS指令序列进行仿真的.

As I already mentioned on your previous question, these latency values are not published or specified, and in fact they may (and do) change from GPU to GPU, from one instruction type to another (e.g. floating point multiply and floating point add may have different latencies), and may even change from CUDA version to CUDA version, for certain operation types which are emulated via a sequence of multiple SASS instructions.

然后，为了发现此延迟数据，有必要进行某种形式的微基准测试. 这里.没有用于GPU的延迟微基准数据的单一规范参考，也没有针对基准测试程序的单一规范参考.这是一项相当艰巨的任务.

In order to discover this latency data, then, it's necessary to do some form of micro-benchmarking. An early and oft-cited paper demonstrating how this may be done for CUDA GPUs is here. There is not one single canonical reference for latency micro-benchmark data for GPUs, nor is there a single canonical reference for the benchmark programs to do it. It is a fairly difficult undertaking.

为此目的是否有一些基准套件?

Is there some benchmark suite for this purpose?

对于SO，这类问题显然是题外话.请阅读此处，其中指出:

This sort of question is explicitly off-topic for SO. Please read here where it states:

让我们推荐或查找书籍，工具，软件库，教程或其他非现场资源的问题对于Stack Overflow来说是没有意义的."

"Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow..."

这篇关于如何确定我的GPU是否执行16/32/64位算术运算?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何确定我的GPU是否执行16/32/64位算术运算? [英] How to determine if my GPU does 16/32/64 bit arithmetic operations?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

如何确定我的GPU是否执行16/32/64位算术运算? [英] How to determine if my GPU does 16/32/64 bit arithmetic operations?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭