nvidia-smi Volatile GPU-Utilization 解释? [英] nvidia-smi Volatile GPU-Utilization explanation?

查看:152
本文介绍了nvidia-smi Volatile GPU-Utilization 解释?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道 nvidia-smi -l 1 会每隔一秒提供一次 GPU 使用率(类似于以下内容).但是,我希望能解释一下 Volatile GPU-Util 的真正含义.这是使用的 SM 数量超过总 SM 的数量,还是占用率,还是其他什么?

I know that nvidia-smi -l 1 will give the GPU usage every one second (similarly to the following). However, I would appreciate an explanation on what Volatile GPU-Util really means. Is that the number of used SMs over total SMs, or the occupancy, or something else?

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20c          Off  | 0000:03:00.0     Off |                    0 |
| 30%   41C    P0    53W / 225W |      0MiB /  4742MiB |     96%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K20c          Off  | 0000:43:00.0     Off |                    0 |
| 36%   49C    P0    95W / 225W |   4516MiB /  4742MiB |     63%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    1      5193    C   python                                        4514MiB |
+-----------------------------------------------------------------------------+

推荐答案

它是 一个采样测量时间段.在给定的时间段内,它报告一个或多个 GPU 内核处于活动状态(即运行)的时间百分比.

It is a sampled measurement over a time period. For a given time period, it reports what percentage of time one or more GPU kernel(s) was active (i.e. running).

它不会告诉您使用了多少 SM,或者忙"到什么程度.代码是,或者它正在做什么,或者它可能以什么方式使用了内存.

It doesn't tell you anything about how many SMs were used, or how "busy" the code was, or what it was doing exactly, or in what way it may have been using memory.

可以使用微基准测试类型的练习轻松验证上述声明(见下文).

The above claim(s) can be verified without too much difficulty using a microbenchmarking-type exercise (see below).

基于 Nvidia 文档,采样周期可能在 1 秒到 1/6 秒之间,具体取决于产品.但是,句点应该不会对您如何解释结果产生太大影响.

Based on the Nvidia docs, The sample period may be between 1 second and 1/6 second depending on the product. However, the period shouldn't make much difference on how you interpret the result.

此外,易挥发"一词与 nvidia-smi 中的此数据项无关.您误读了输出格式.

Also, the word "Volatile" does not pertain to this data item in nvidia-smi. You are misreading the output format.

这是支持我的主张的简单代码:

Here's a trivial code that supports my claim:

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

const long long tdelay=1000000LL;
const int loops = 10000;
const int hdelay = 1;

__global__ void dkern(){

  long long start = clock64();
  while(clock64() < start+tdelay);
}

int main(int argc, char *argv[]){

  int my_delay = hdelay;
  if (argc > 1) my_delay = atoi(argv[1]);
  for (int i = 0; i<loops; i++){
    dkern<<<1,1>>>();
    usleep(my_delay);}

  return 0;
}

在我的系统上,当我以 100 的命令行参数运行上述代码时,nvidia-smi 将报告 99% 的利用率.当我使用 1000 的命令行参数运行时,nvidia-smi 将报告 ~83% 的利用率.当我使用 10000 的命令行参数运行它时,nvidia-smi 将报告 ~9% 的利用率.

On my system, when I run the above code with a command line parameter of 100, nvidia-smi will report 99% utilization. When I run with a command line parameter of 1000, nvidia-smi will report ~83% utilization. When I run it with a command line parameter of 10000, nvidia-smi will report ~9% utilization.

这篇关于nvidia-smi Volatile GPU-Utilization 解释?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆