如何解释TensorFlow输出? [英] How to interpret TensorFlow output?

查看:156
本文介绍了如何解释TensorFlow输出?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何解释TensorFlow输出以在GPGPU上构建和执行计算图?

How do I interpret the TensorFlow output for building and executing computational graphs on GPGPUs?

给出以下使用python API执行任意tensorflow脚本的命令.

Given the following command that executes an arbitrary tensorflow script using the python API.

python3 tensorflow_test.py>退出

python3 tensorflow_test.py > out

第一部分stream_executor似乎是其加载依赖性.

The first part stream_executor seems like its loading dependencies.

I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally

什么是NUMA节点?

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

我认为这是在找到可用的GPU时

I assume this is when it finds the available GPU

I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:01:00.0
Total memory: 11.25GiB
Free memory: 11.15GiB

某些gpu初始化?什么是DMA?

Some gpu initialization? what is DMA?

I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:01:00.0)

为什么会引发错误E?

E tensorflow/stream_executor/cuda/cuda_driver.cc:932] failed to allocate 11.15G (11976531968 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

pool_allocator功能的好答案: https://stackoverflow.com/a/35166985/4233809

I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 3160 get requests, put_count=2958 evicted_count=1000 eviction_rate=0.338066 and unsatisfied allocation rate=0.412025
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 100 to 110
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1743 get requests, put_count=1970 evicted_count=1000 eviction_rate=0.507614 and unsatisfied allocation rate=0.456684
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 256 to 281
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1986 get requests, put_count=2519 evicted_count=1000 eviction_rate=0.396983 and unsatisfied allocation rate=0.264854
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 655 to 720
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 28728 get requests, put_count=28680 evicted_count=1000 eviction_rate=0.0348675 and unsatisfied allocation rate=0.0418407
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 1694 to 1863

推荐答案

关于NUMA – https://software.intel.com/zh-cn/articles/optimizing-applications-for-numa

粗略地说,如果您有双插槽CPU,它们每个都将拥有自己的内存,并且必须通过较慢的QPI链接访问另一个处理器的内存.因此,每个CPU +内存都是一个NUMA节点.

Roughly speaking, if you have dual-socket CPU, they will each have their own memory and have to access the other processor's memory through a slower QPI link. So each CPU+memory is a NUMA node.

可能您可以将两个不同的NUMA节点视为两个不同的设备,并构建网络以针对不同的节点内/节点间带宽进行优化

Potentially you could treat two different NUMA nodes as two different devices and structure your network to optimize for different within-node/between-node bandwidth

但是,我认为TF中目前没有足够的接线来进行此操作.检测也不起作用-我只是在一台具有2个NUMA节点的计算机上尝试过,但它仍会打印相同的消息并初始化为1个NUMA节点.

However, I don't think there's enough wiring in TF right now to do this right now. The detection doesn't work either -- I just tried on a machine with 2 NUMA nodes, and it still printed the same message and initialized to 1 NUMA node.

DMA =直接内存访问.您可能会在不利用CPU的情况下(即通过NVlink)将内容从一个GPU复制到另一个GPU. NVLink集成尚不存在.

DMA = Direct Memory Access. You could potentially copy things from one GPU to another GPU without utilizing CPU (ie, through NVlink). NVLink integration isn't there yet.

就该错误而言,TensorFlow尝试分配接近GPU最大内存的内存,因此听起来您的某些GPU内存已被分配给其他对象,并且分配失败.

As far as the error, TensorFlow tries to allocate close to GPU max memory so it sounds like some of your GPU memory is already been allocated to something else and the allocation failed.

您可以执行以下类似操作以避免分配过多的内存

You can do something like below to avoid allocating so much memory

config = tf.ConfigProto(log_device_placement=True)
config.gpu_options.per_process_gpu_memory_fraction=0.3 # don't hog all vRAM
config.operation_timeout_in_ms=15000   # terminate on long hangs
sess = tf.InteractiveSession("", config=config)

这篇关于如何解释TensorFlow输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆