了解 tensorflow 分析结果 [英] Understanding tensorflow profiling results
问题描述
/job:localhost/replica:0/task:0/gpu:0 计算(pid 3)
我的问题:
a)每一行的含义是什么.
b) 特别是 /gpu:0/stream:all Compute(pid 5)
和 /job:localhost/replica:0/task:0/gpu 有什么区别:0 计算(pid 3)
.
c)为什么它们的执行时间不同,分别是0.072ms
和0.094ms
.
以下是一位工程师的更新:
'/gpu:0/stream:*' 时间线是 CUDA 内核执行时间的硬件跟踪.
'/gpu:0' 行是 TF 软件设备在 CUDA 流上排队操作(通常几乎花费零时间)
This example shows how to profile tensorflow programs. I have used this tool to profile my program, a simple LSTM. And the results is shown as:
/gpu:0/stream:all Compute(pid 5)
/job:localhost/replica:0/task:0/gpu:0 Compute(pid 3)
My question :
a)what is the meaning of each row.
b)Especially what is the difference between /gpu:0/stream:all Compute(pid 5)
and /job:localhost/replica:0/task:0/gpu:0 Compute(pid 3)
.
c)Why their execution time are different, namely 0.072ms
and 0.094ms
.
Here's an update from one of the engineers:
The '/gpu:0/stream:*' timelsines are hardware tracing of CUDA kernel execution times.
The '/gpu:0' lines are the TF software device enqueueing the ops on the CUDA stream (usually takes almost zero time)
这篇关于了解 tensorflow 分析结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!