我需要什么 K.clear_session() 和 del 模型(Keras with Tensorflow-gpu)? [英] What do I need K.clear_session() and del model for (Keras with Tensorflow-gpu)?

查看:49
本文介绍了我需要什么 K.clear_session() 和 del 模型(Keras with Tensorflow-gpu)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在做什么
我正在训练并使用卷积神经元网络 (CNN) 进行图像分类,使用 Keras 和 Tensorflow-gpu 作为后端.

What I am doing
I am training and using a convolutional neuron network (CNN) for image-classification using Keras with Tensorflow-gpu as backend.

我在用什么
- PyCharm 社区 2018.1.2
- Python 2.7 和 3.5(但不能同时使用)
- Ubuntu 16.04
- Keras 2.2.0
- Tensorflow-GPU 1.8.0 作为后端

What I am using
- PyCharm Community 2018.1.2
- both Python 2.7 and 3.5 (but not both at a time)
- Ubuntu 16.04
- Keras 2.2.0
- Tensorflow-GPU 1.8.0 as backend

我想知道的
在许多代码中,我看到人们使用

What I want to know
In many codes I see people using

from keras import backend as K 

# Do some code, e.g. train and save model

K.clear_session()

或使用后删除模型:

del model

关于 clear_session 的 keras 文档说:破坏当前的 TF 图并创建一个新的图.有助于避免旧模型/层的混乱."- https://keras.io/backend/

The keras documentation says regarding clear_session: "Destroys the current TF graph and creates a new one. Useful to avoid clutter from old models / layers." - https://keras.io/backend/

这样做有什么意义,我也应该这样做吗?在加载或创建新模型时,我的模型无论如何都会被覆盖,那何必呢?

What is the point of doing that and should I do it as well? When loading or creating a new model my model gets overwritten anyway, so why bother?

推荐答案

K.clear_session() 在您连续创建多个模型时非常有用,例如在超参数搜索或交叉验证期间.您训练的每个模型都会向图中添加节点(可能以千计).每当您(或 Keras)调用 tf.Session.run()tf.Tensor.eval() 时,TensorFlow 都会执行整个图,因此您的模型会变得越来越慢训练,你也可能会耗尽内存.清除会话会删除以前模型遗留的所有节点,释放内存并防止速度变慢.

K.clear_session() is useful when you're creating multiple models in succession, such as during hyperparameter search or cross-validation. Each model you train adds nodes (potentially numbering in the thousands) to the graph. TensorFlow executes the entire graph whenever you (or Keras) call tf.Session.run() or tf.Tensor.eval(), so your models will become slower and slower to train, and you may also run out of memory. Clearing the session removes all the nodes left over from previous models, freeing memory and preventing slowdown.

编辑 21/06/19:

TensorFlow 默认是惰性求值的.TensorFlow 操作不会立即评估:创建张量或对其执行一些操作会在数据流图中创建节点.当您调用 tf.Session.run()tf.Tensor.eval() 时,通过一次性评估图形的相关部分来计算结果.这样 TensorFlow 就可以构建一个执行计划,将可以并行执行的操作分配给不同的设备.它还可以将相邻的节点折叠在一起或删除多余的节点(例如,如果您连接两个张量,然后将它们再次分开,保持不变).有关更多详细信息,请参阅 https://www.tensorflow.org/guide/graphs

TensorFlow is lazy-evaluated by default. TensorFlow operations aren't evaluated immediately: creating a tensor or doing some operations to it creates nodes in a dataflow graph. The results are calculated by evaluating the relevant parts of the graph in one go when you call tf.Session.run() or tf.Tensor.eval(). This is so TensorFlow can build an execution plan that allocates operations that can be performed in parallel to different devices. It can also fold adjacent nodes together or remove redundant ones (e.g. if you concatenated two tensors and later split them apart again unchanged). For more details, see https://www.tensorflow.org/guide/graphs

您的所有 TensorFlow 模型都作为一系列张量和张量运算存储在图中.机器学习的基本操作是张量点积——神经网络的输出是输入矩阵和网络权重的点积.如果您有一个单层感知器和 1,000 个训练样本,那么每个 epoch 至少会创建 1,000 个张量操作.如果您有 1,000 个 epoch,那么在考虑预处理、后处理和更复杂的模型(例如循环网络、编码器-解码器、注意力模型等)之前,您的图最后至少包含 1,000,000 个节点.

All of your TensorFlow models are stored in the graph as a series of tensors and tensor operations. The basic operation of machine learning is tensor dot product - the output of a neural network is the dot product of the input matrix and the network weights. If you have a single-layer perceptron and 1,000 training samples, then each epoch creates at least 1,000 tensor operations. If you have 1,000 epochs, then your graph contains at least 1,000,000 nodes at the end, before taking into account preprocessing, postprocessing, and more complex models such as recurrent nets, encoder-decoder, attentional models, etc.

问题是最终图形会太大而无法放入视频内存(在我的情况下为 6 GB),因此 TF 会将图形的一部分从视频传送到主内存并返回.最终它甚至会变得对于主内存 (12 GB) 来说太大,并开始在主内存和硬盘之间移动.不用说,随着训练的进行,这让事情变得难以置信,而且越来越慢.在开发这个保存模型/清除会话/重新加载模型流程之前,我计算出,按照我经历的每个时期的减速率,我的模型需要比宇宙年龄更长的时间来完成训练.

The problem is that eventually the graph would be too large to fit into video memory (6 GB in my case), so TF would shuttle parts of the graph from video to main memory and back. Eventually it would even get too large for main memory (12 GB) and start moving between main memory and the hard disk. Needless to say, this made things incredibly, and increasingly, slow as training went on. Before developing this save-model/clear-session/reload-model flow, I calculated that, at the per-epoch rate of slowdown I experienced, my model would have taken longer than the age of the universe to finish training.

免责声明:我已经快一年没有使用 TensorFlow,所以这可能已经改变了.我记得围绕这个问题有很多 GitHub 问题,所以希望它已经得到修复.

Disclaimer: I haven't used TensorFlow in almost a year, so this might have changed. I remember there being quite a few GitHub issues around this so hopefully it has since been fixed.

这篇关于我需要什么 K.clear_session() 和 del 模型(Keras with Tensorflow-gpu)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆