我需要什么K.clear_session()和del模型(带有Tensorflow-gpu的Keras)? [英] What do I need K.clear_session() and del model for (Keras with Tensorflow-gpu)?

查看:1052
本文介绍了我需要什么K.clear_session()和del模型(带有Tensorflow-gpu的Keras)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在做什么
我正在训练和使用卷积神经元网络(CNN)进行图像分类,使用Keras和Tensorflow-gpu作为后端.

What I am doing
I am training and using a convolutional neuron network (CNN) for image-classification using Keras with Tensorflow-gpu as backend.

我在使用什么
-PyCharm社区2018.1.2
-Python 2.7和3.5(但不能一次都使用)
-Ubuntu 16.04
-Keras 2.2.0
-Tensorflow-GPU 1.8.0作为后端

What I am using
- PyCharm Community 2018.1.2
- both Python 2.7 and 3.5 (but not both at a time)
- Ubuntu 16.04
- Keras 2.2.0
- Tensorflow-GPU 1.8.0 as backend

我想知道的
在许多代码中,我看到人们在使用

What I want to know
In many codes I see people using

from keras import backend as K 

# Do some code, e.g. train and save model

K.clear_session()

或在使用模型后将其删除:

or deleting the model after using it:

del model

keras文档关于clear_session说:销毁当前的TF图并创建一个新的TF图.有助于避免旧模型/图层造成混乱." - https://keras.io/backend/

The keras documentation says regarding clear_session: "Destroys the current TF graph and creates a new one. Useful to avoid clutter from old models / layers." - https://keras.io/backend/

这样做的目的是什么?我也应该这样做吗?在加载或创建新模型时,无论如何我的模型都会被覆盖,那么为什么要打扰呢?

What is the point of doing that and should I do it as well? When loading or creating a new model my model gets overwritten anyway, so why bother?

推荐答案

K.clear_session()在连续创建多个模型(例如在超参数搜索或交叉验证期间)时非常有用.您训练的每个模型都会向图中添加节点(可能以数千为单位).每当您(或Keras)调用tf.Session.run()或tf.Tensor.eval()时,TensorFlow都会执行整个图形,因此模型的训练速度将越来越慢,并且内存可能也用完了.清除会话会删除以前模型中剩余的所有节点,从而释放内存并防止速度变慢.

K.clear_session() is useful when you're creating multiple models in succession, such as during hyperparameter search or cross-validation. Each model you train adds nodes (potentially numbering in the thousands) to the graph. TensorFlow executes the entire graph whenever you (or Keras) call tf.Session.run() or tf.Tensor.eval(), so your models will become slower and slower to train, and you may also run out of memory. Clearing the session removes all the nodes left over from previous models, freeing memory and preventing slowdown.

编辑19/06/19: TensorFlow默认为惰性评估. TensorFlow操作不会立即进行评估:创建张量或对其执行某些操作会在数据流图中创建节点.通过调用tf.Session.run()或tf.Tensor.eval()一次评估图形的相关部分来计算结果.这样TensorFlow可以构建一个执行计划,该计划分配可以并行执行到不同设备的操作.它还可以将相邻节点折叠在一起或删除多余的节点(例如,如果您将两个张量连接在一起,然后又将它们分开而又保持不变).有关更多详细信息,请参见 https://www.tensorflow.org/guide/graphs

Edit 21/06/19: TensorFlow is lazy-evaluated by default. TensorFlow operations aren't evaluated immediately: creating a tensor or doing some operations to it creates nodes in a dataflow graph. The results are calculated by evaluating the relevant parts of the graph in one go when you call tf.Session.run() or tf.Tensor.eval(). This is so TensorFlow can build an execution plan that allocates operations that can be performed in parallel to different devices. It can also fold adjacent nodes together or remove redundant ones (e.g. if you concatenated two tensors and later split them apart again unchanged). For more details, see https://www.tensorflow.org/guide/graphs

所有TensorFlow模型都作为一系列张量和张量操作存储在图中.机器学习的基本操作是张量点积-神经网络的输出是输入矩阵和网络权重的点积.如果您有一个单层感知器和1,000个训练样本,则每个时期至少会创建1,000个张量操作.如果您有1,000个时期,那么在考虑预处理,后处理以及更复杂的模型(例如递归网络,编码器-解码器,注意模型等)之前,图形的末尾至少包含1,000,000个节点.

All of your TensorFlow models are stored in the graph as a series of tensors and tensor operations. The basic operation of machine learning is tensor dot product - the output of a neural network is the dot product of the input matrix and the network weights. If you have a single-layer perceptron and 1,000 training samples, then each epoch creates at least 1,000 tensor operations. If you have 1,000 epochs, then your graph contains at least 1,000,000 nodes at the end, before taking into account preprocessing, postprocessing, and more complex models such as recurrent nets, encoder-decoder, attentional models, etc.

问题在于,最终图形将太大而无法容纳视频内存(在我的情况下为6 GB),因此TF会将图形的某些部分从视频传递到主存储器,然后再返回.最终,它对于主内存(12 GB)甚至会变得太大,并开始在主内存和硬盘之间移动.不用说,随着培训的进行,这使事情变得令人难以置信,并且越来越慢.在开发此保存模型/清除会话/重新加载模型流程之前,我计算出,按照我经历的每个周期的减慢速度,我的模型完成训练所需的时间将比宇宙年龄更长.免责声明:我将近一年没有使用过TensorFlow,所以可能已经改变了.我记得有很多关于GitHub的问题,希望此后已解决.

The problem is that eventually the graph would be too large to fit into video memory (6 GB in my case), so TF would shuttle parts of the graph from video to main memory and back. Eventually it would even get too large for main memory (12 GB) and start moving between main memory and the hard disk. Needless to say, this made things incredibly, and increasingly, slow as training went on. Before developing this save-model/clear-session/reload-model flow, I calculated that, at the per-epoch rate of slowdown I experienced, my model would have taken longer than the age of the universe to finish training. Disclaimer: I haven't used TensorFlow in almost a year, so this might have changed. I remember there being quite a few GitHub issues around this so hopefully it has since been fixed.

这篇关于我需要什么K.clear_session()和del模型(带有Tensorflow-gpu的Keras)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆