nvidia-smi挥发性GPU利用率解释? [英] nvidia-smi Volatile GPU-Utilization explanation?
问题描述
我知道 nvidia-smi -l 1
会每秒钟提供GPU使用率(类似于以下内容)。然而,我将不胜感激的解释了什么易失性GPU-Util
真的意味着。
I know that nvidia-smi -l 1
will give the GPU usage every one second (similarly to the following). However, I would appreciate an explanation on what Volatile GPU-Util
really means. Is that the number of used SMs over total SMs, or the occupancy, or something else?
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48 Driver Version: 367.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20c Off | 0000:03:00.0 Off | 0 |
| 30% 41C P0 53W / 225W | 0MiB / 4742MiB | 96% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20c Off | 0000:43:00.0 Off | 0 |
| 36% 49C P0 95W / 225W | 4516MiB / 4742MiB | 63% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 5193 C python 4514MiB |
+-----------------------------------------------------------------------------+
推荐答案
这是一段时间内的抽样测量。对于给定的时间段,它报告一个或多个GPU内核活动(即运行)的时间的百分比。
It is a sampled measurement over a time period. For a given time period, it reports what percentage of time one or more GPU kernel(s) was active (i.e. running).
它不告诉你什么有多少个SM被使用,或者代码的忙,或者它正在做什么,或者它以什么方式使用了内存。
It doesn't tell you anything about how many SMs were used, or how "busy" the code was, or what it was doing exactly, or in what way it may have been using memory.
我不知道如何精确地定义时间段,而是使用微基准化类型的练习,但因为它也是整体只是一个抽样的测量(即 nvidia-smi
报告一个抽样测量经常,因为你轮询它)我不认为这应该是那么重要一般使用或理解的工具。该时间段显然很短,并且不一定与 nvidia-smi
的轮询间隔(如果指定了一个)相关。也可以使用微基准化技术揭示采样时间段。
I don't know how to define the time period exactly, but since it is also overall just a sampled measurement (i.e. nvidia-smi
reports one sampled measurement as often as you poll it) I don't think it should be that important for general usage or understanding of the tool. The time period is obviously short, and is not necessarily related to the polling interval, if one is specified, for nvidia-smi
. It might be possible to uncover the sampling time period using microbenchmarking techniques also.
此外,Volatile一词与 nvidia-smi
。
Also, the word "Volatile" does not pertain to this data item in nvidia-smi
. You are misreading the output format.
这是一个支持我声明的小代码:
Here's a trivial code that supports my claim:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
const long long tdelay=1000000LL;
const int loops = 10000;
const int hdelay = 1;
__global__ void dkern(){
long long start = clock64();
while(clock64() < start+tdelay);
}
int main(int argc, char *argv[]){
int my_delay = hdelay;
if (argc > 1) my_delay = atoi(argv[1]);
for (int i = 0; i<loops; i++){
dkern<<<1,1>>>();
usleep(my_delay);}
return 0;
}
在我的系统上,当我使用命令行参数100,nvidia-smi将报告99%的利用率。当我使用命令行参数1000运行时,nvidia-smi将报告〜83%的利用率。当我使用命令行参数10000运行它时,nvidia-smi将报告〜9%的利用率。
On my system, when I run the above code with a command line parameter of 100, nvidia-smi will report 99% utilization. When I run with a command line parameter of 1000, nvidia-smi will report ~83% utilization. When I run it with a command line parameter of 10000, nvidia-smi will report ~9% utilization.
这篇关于nvidia-smi挥发性GPU利用率解释?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!