为什么nvprof和nvidia-smi报告的电源结果不同? [英] why do nvprof and nvidia-smi report different results on power?

查看:312
本文介绍了为什么nvprof和nvidia-smi报告的电源结果不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我分别使用nvprof和nvidia-smi监视GPU功耗,但是观察到了不同的结果,总结在下表中。

I used nvprof and nvidia-smi to monitor the GPU power dissipation respectively, but observed different results, summarized in the table below.

----------------------------------------------------------------
gpu     |             busy           |             idle         
model   |  nvprof[Watt]  smi[Watt]   |  nvprof[Watt]  smi[Watt] 
----------------------------------------------------------------
M2090   |   ~151           ~151      |     ~100          ~75
K20     |   ~105           ~102      |     ~63           ~43
----------------------------------------------------------------

注释0:忙表示我的代码正在受监视的GPU上运行

note 0: "busy" means my code is running on the monitored GPU

注释1:nvprof报告所有设备的电源。因此,我要使用nvprof为特定GPU获得空闲功能的方法就是简单地使代码在另一个GPU上运行。

note 1: nvprof reports the power for all the devices. So my way to get the "idle" power using nvprof for a specific GPU is simply to have the code running on another GPU.

注2:nvidia-smi报告了一个关于功率的几个不同数量,但我主要关注功率消耗

note 2: nvidia-smi reports a couple of different quantities about power, but I was focusing on "power draw"

注3:cuda版本:5.5

note 3: cuda version: 5.5

所以我的问题是:为什么nvidia-smi报告的功率通常小于nvprof,为什么在监视空闲功率时这种差异会变得更大?最终,我应该更信任哪个公用事业?

So my question is: why is the power reported by nvidia-smi generally smaller than nvprof, and why does this discrepancy become larger when the idle power is monitored? and ultimately, which utility should I trust more?

还要确保两个公用事业所测量的功率是否指的是输入电功率(P = I * U)而不是输出热功率,对吧?

Also, just to make sure, does the power that the two utilies measure refer to the input electric power (P=I*U) rather than the output thermal power, right?

非常感谢您提供任何建议!

Thanks a lot for any advice!

更新
@njuffa和@talonmies的猜测很有意义。因此,我对smi进行了更多研究以进行功率分析。结果对我来说没有意义。

Update @njuffa and @talonmies 's speculation makes very good sense. So I explored smi a little bit more for power analysis. The results, however, do not make sense to me.

附加说明:


  1. 红色数据的不连续性是因为我直接使用了smi报告的
    时间戳,它的分辨率(秒)低。此外,用于说明目的p0的
    分配了20的数值,
    p1的数值为10。因此,在大多数情况下,GPU处于其完整的
    性能状态(这是 odd ),但忙碌情况除外,在这种情况下,GPU
    在15〜18s内下降到p1( odd )。

  1. The discontinuity of the red data is because I directly used the timestamp reported by smi, which has low resolution (sec). Besides, for illustration purpose p0 is assigned an numerical value of 20 and p1 of 10. So for most of the time, the GPU is put into its full performance state (this is odd), except for the "busy" case, where the GPU somehow drops to p1 during 15~18s (odd).

直到大约21.3秒才第一次调用cudaSetDevice()。因此,在约18s
处发生的功率上升和p状态变化相当奇数

It is not until ~21.3s that cudaSetDevice() is invoked for the very first time. So the power rise and p-state change that occurs at ~18s is rather odd.

繁忙功率是在将我的GPU代码设置为后台时进行测量的,
和smi进入无限循环以反复查询功率并p状态
直到后台进程终止。 闲置功率为
,只需通过发射smi 50次即可测得。显然,在后者的
情况下,smi的开销更大,再次是 odd

"busy power" is measured when my GPU code is set to the background, and smi put into an infinite loop to query the power and p-state repeatedly until the background process terminates. "idle power" is measured simply by launching smi 50 times. Apparently in the latter case, smi exhibits larger overhead, which is again, odd.


推荐答案

忽略p状态。它们使您感到困惑。

Ignore the p-states. They are confusing you.

nvprof(单独使用)使用的GPU比nvidia-smi(单独使用)要多得多。因此,运行nvprof时消耗的空闲功率要比仅执行nvidia-smi时要高。 nvprof在GPU上触发了许多引擎,而nvidia-smi只是在某些寄存器和某些I2C电路上触发了。

nvprof (alone) uses substantially more of the GPU than does nvidia-smi (alone). So the "idle" power consumed when running nvprof is higher than it is when just doing nvidia-smi. nvprof fires up a number of engines on the GPU, whereas nvidia-smi simply fires up some registers and maybe some I2C circuitry.

GPU有许多p状态。 ,并且真正的空闲p状态为P8或更低(即更大)。

The GPU has a number of p-states, and a true idle p-state is P8 or below (i.e. larger).

只需运行nvidia-smi即可(通常会)将GPU的p状态短暂地从真正的空闲 P状态提升到更高的状态,像P0。这不会告诉您:
-发生p态升高多长时间(nvidia-smi的采样周期太粗糙)
-实际消耗了多少功率。是的,p状态是一个指标,但不会以校准的方式告诉您任何信息。在P0时,GPU可能或多或少地处于空闲状态(例如,将GPU置于持久模式)。

Just running nvidia-smi can (frequently will) raise the p-state of the GPU, briefly, from a "true idle" P-state to a higher one, like P0. This does not tell you: - how long the p-state elevation is occurring (the sampling period of nvidia-smi is too coarse) - how much power is actually getting consumed. Yes, p-state is an indicator, but does not tell you anything in a calibrated way. A GPU can be more or less "idle" while at P0 (for instance, put your GPUs in persistence mode).

这两个测量之间的差异已经得到了解释。 。该图形和其他更新没有任何用处,只是使您感到困惑。

The discrepancy between the two measurements has already been explained. The graph and additional update is not serving any useful purpose, it's just confusing you.

如果要测量功率,请使用两种方法。显然,它们与GPU忙情况有很大的相关性,并且在空闲情况下它们似乎有所不同,这实际上意味着您在两种情况下都对空闲进行了假设,但这并不成立。

If you want to measure power, use either approach. It's clear that they are quite correlated for the GPU "busy" case, and the fact that they appear to be different in the "idle" case simply means you're making assumptions about "idle" in both cases which simply aren't true.

这篇关于为什么nvprof和nvidia-smi报告的电源结果不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆