计算器和nvprof之间的占用率不同 [英] different occupancy between calculator and nvprof

查看:144
本文介绍了计算器和nvprof之间的占用率不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用nvprof来衡量已达到的入住率

I am using nvprof to measure achieved occupancy and I am findind it as


已实现入住率0.344031 0.344031 0.344031

Achieved Occupancy 0.344031 0.344031 0.344031

但是使用占用率计算器,我发现75%。

but using occupancy calculator , I am finding 75%.

结果是:

Active Threads per Multiprocessor   1536
Active Warps per Multiprocessor 48
Active Thread Blocks per Multiprocessor 6
Occupancy of each Multiprocessor    75%

我正在使用33个寄存器,144个字节的共享内存,256个线程/块,设备功能3.5。

I am using 33 registers , 144 bytes shared memory , 256 threads/block ,device capability 3.5.

编辑:

我也想澄清一下。在 http://docs.nvidia.com/cuda/profiler-users-guide/#axzz30pb9tBTN 表示

Also , something I want to clarify.In http://docs.nvidia.com/cuda/profiler-users-guide/#axzz30pb9tBTN it states for


gld_efficiency

gld_efficiency

请求的全局内存负载吞吐率与回覆要求的全局
内存负载吞吐量以百分比表示

Ratio of requested global memory load throughput to required global memory load throughput expressed as percentage

因此,如果为0%,则表示我没有全局内存

So , If this is 0% it means that I have no global memory transfers in the kernel?

推荐答案

您需要了解占用率计算器仅根据该内核的资源需求提供给定内核可以实现的最大理论占用率。它没有(也无法)说出代码能够实现多少理论上的占用。

You need to understand that the occupancy calculator is providing the maximum theoretical occupancy that a given kernel can achieve, based only on the resource requirements of that kernel. It does not (and cannot) say anything about how much of that theoretical occupancy the code is capable of achieving.

另一方面,配置文件工具推断出。根据文档,您要查询的已达到入住人数的计算方式为

The profiling tools, on the other hand, deduce actual occupancy from measured profile counters. According to this document, the achieved occupancy number you are asking about is calculated as

(active_warps / active_cycles) / MAX_WARPS_PER_SM

ie。它会在内核运行期间对一个或多个SM上的活动经纱数量进行采样,并据此计算实际占用量。

ie. it samples the number of active warps on one or more SM during a kernel run and calculates actual occupancy from that

达到其理论占用率,并且(在您提出之前),不,我不能告诉您您的内核为什么没有达到理论占用率。但是Visual Profiler可以。如果对您来说很重要,建议您查看CUDA 5/6可视分析器中提供的自动化性能分析功能,以更好地了解代码的性能。

There can be a lot of reasons why a kernel doesn't achieve its theoretical occupancy, and (before you ask), no I can't tell you why your kernel doesn't reach theoretical occupancy. But the Visual Profiler can. If it is important to you, I suggest you look at the automated performance analysis features available in the CUDA 5/6 visual profiler as a way of better understanding the performance of your code.

还值得指出的是,占用率应仅作为潜在代码性能的粗略度量,而较高的理论占用率并不总是会转化为高性能。指令级并行性和等待时间最小化策略也可以非常有效地达到较高的性能水平,即使占用率较低。这方面有大量的工作要做,大部分来自瓦西里·沃尔科夫(Vasily Volkov)开创性的 GTC 2010论文

It is also worth pointing out that occupancy should be treated as only a rough metric of potential code performance, and high theoretical occupancy doesn't always translate into high performance. Instruction level parallelism and latency minimisation strategies can also be very effective at reaching high levels of performance, even at low occupancy. There is a large body work on this, most stemming from Vasily Volkov's seminal GTC 2010 paper.

这篇关于计算器和nvprof之间的占用率不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆