有没有办法确定 TensorFlow 使用了多少 GPU 内存? [英] Is there a way of determining how much GPU memory is in use by TensorFlow?

查看:54
本文介绍了有没有办法确定 TensorFlow 使用了多少 GPU 内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Tensorflow 倾向于在其 GPU 上预分配整个可用内存.对于调试,有没有办法知道实际使用了多少内存?

Tensorflow tends to preallocate the entire available memory on it's GPUs. For debugging, is there a way of telling how much of that memory is actually in use?

推荐答案

(1) Timeline 用于记录内存分配.下面是它的用法示例:

(1) There is some limited support with Timeline for logging memory allocations. Here is an example for its usage:

    run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
    run_metadata = tf.RunMetadata()
    summary, _ = sess.run([merged, train_step],
                          feed_dict=feed_dict(True),
                          options=run_options,
                          run_metadata=run_metadata)
    train_writer.add_run_metadata(run_metadata, 'step%03d' % i)
    train_writer.add_summary(summary, i)
    print('Adding run metadata for', i)
    tl = timeline.Timeline(run_metadata.step_stats)
    print(tl.generate_chrome_trace_format(show_memory=True))
    trace_file = tf.gfile.Open(name='timeline', mode='w')
    trace_file.write(tl.generate_chrome_trace_format(show_memory=True))

您可以使用 MNIST 示例(带有摘要的 mnist)

You can give this code a try with the MNIST example (mnist with summaries)

这将生成一个名为时间轴的跟踪文件,您可以使用 chrome://tracing 打开该文件.请注意,这仅提供近似的 GPU 内存使用统计信息.它基本上模拟了 GPU 执行,但无法访问完整的图形元数据.它也不知道分配给 GPU 的变量有多少.

This will generate a tracing file named timeline, which you can open with chrome://tracing. Note that this only gives an approximated GPU memory usage statistics. It basically simulated a GPU execution, but doesn't have access to the full graph metadata. It also can't know how many variables have been assigned to the GPU.

(2) 对于 GPU 内存使用情况的非常粗略的测量,nvidia-smi 将显示您运行命令时的总设备内存使用情况.

(2) For a very coarse measure of GPU memory usage, nvidia-smi will show the total device memory usage at the time you run the command.

nvprof 可以显示 CUDA 内核级别的片上共享内存使用情况和寄存器使用情况,但不显示全局/设备内存使用情况.

nvprof can show the on-chip shared memory usage and register usage at the CUDA kernel level, but doesn't show the global/device memory usage.

这是一个示例命令:nvprof --print-gpu-trace matrixMul

Here is an example command: nvprof --print-gpu-trace matrixMul

这里有更多详细信息:http://docs.nvidia.com/cuda/profiler-users-guide/#抽象

这篇关于有没有办法确定 TensorFlow 使用了多少 GPU 内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆