TF 2.0 W操作已更改...禁用快速模式并使用回调时 [英] TF 2.0 W Operation was changed ... when disabling eager mode and using a callback

查看:131
本文介绍了TF 2.0 W操作已更改...禁用快速模式并使用回调时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用TF2.0中的一些LSTM层. 出于训练目的,我使用回调LearningRateScheduler,出于速度目的,我禁用了Tensorflow(disable_eager_execution)的急切模式. 但是当我同时使用这两个函数时,tensorflow会发出警告:

I'm using some LSTM layers from TF2.0. For training purpose I'm using the callback LearningRateScheduler, and for speed purpose I disable the eager mode of Tensorflow (disable_eager_execution). But when I am using both of these functions, tensorflow raise a warning:

Operation ... was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session

这是一个自定义脚本,用于说明我遇到的问题:

Here is a custom script to illustrate the problem that I have :

import tensorflow as tf
import numpy as np
import time
import math

EAGER = False
DECAY = True

EPOCHS = 5

if not EAGER:
    tf.compat.v1.disable_eager_execution()


def decay_func(lr_init):
    def step_decay(epoch):
        lrate = lr_init * math.pow(0.1, math.floor(epoch / 10))
        return lrate

    return step_decay


decay = tf.keras.callbacks.LearningRateScheduler(decay_func(0.1))


class MySequence(tf.keras.utils.Sequence):
    def __init__(self, batch_size):
        super(MySequence, self).__init__()
        self.batch_size = batch_size

    def __len__(self):
        return 200

    def __getitem__(self, item):
        x = np.expand_dims(np.arange(20), axis=1) + np.random.rand(self.batch_size, 20, 30)
        y = np.expand_dims(np.arange(20, 40), axis=1) + np.random.rand(self.batch_size, 20, 10)
        return x, y


my_sequence = MySequence(batch_size=4)


def build_model():
    inputs = tf.keras.Input(shape=(20, 30))
    x = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(20))(inputs)
    x = tf.keras.layers.LSTM(20, return_sequences=True)(x)
    outputs = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(10))(x)

    model = tf.keras.Model(inputs=inputs, outputs=outputs)
    return model


model = build_model()

model.compile(optimizer='adam', loss='mae')


start_train = time.time()
callbacks = []
if DECAY:
    callbacks.append(decay)
history = model.fit_generator(generator=my_sequence, epochs=EPOCHS, callbacks=callbacks)
end = time.time()


min_train, sec_train = int((end - start_train) // 60), int((end - start_train) % 60)
print(f'Time to train: {min_train}min{sec_train}sec')

因此,当EAGER == FalseDECAY = True时,以下是输出:

So when EAGER == False and DECAY = True, here is the output:

WARNING:tensorflow:From D:\...\VirtualEnv\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1630: calling
 BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
2019-12-13 17:35:17.211443: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Epoch 1/5
2019-12-13 17:35:17.604649: W tensorflow/c/c_api.cc:326] Operation '{name:'lstm/while' id:229 op device:{} def:{{{node lstm/while}} = While[T=[DT_INT32, DT_INT32, DT_INT32, DT_V
ARIANT, DT_FLOAT, ..., DT_VARIANT, DT_VARIANT, DT_VARIANT, DT_VARIANT, DT_VARIANT], _lower_using_switch_merge=true, _num_original_outputs=45, body=lstm_while_body_124[], cond=ls
tm_while_cond_123[], output_shapes=[[], [], [], [], [?,20], ..., [], [], [], [], []], parallel_iterations=32](lstm/while/loop_counter, lstm/while/maximum_iterations, lstm/time,
lstm/TensorArrayV2_1, lstm/zeros, lstm/zeros_1, lstm/strided_slice_1, lstm/TensorArrayUnstack/TensorListFromTensor, lstm/kernel, lstm/recurrent_kernel, lstm/bias, lstm/while/Emp
tyTensorList, lstm/while/EmptyTensorList_1, lstm/while/EmptyTensorList_2, lstm/while/EmptyTensorList_3, lstm/while/EmptyTensorList_4, lstm/while/EmptyTensorList_5, lstm/while/Em
ptyTensorList_6, lstm/while/EmptyTensorList_7, lstm/while/EmptyTensorList_8, lstm/while/EmptyTensorList_9, lstm/while/EmptyTensorList_10, lstm/while/EmptyTensorList_11, lstm/whi
le/EmptyTensorList_12, lstm/while/EmptyTensorList_13, lstm/while/EmptyTensorList_14, lstm/while/EmptyTensorList_15, lstm/while/EmptyTensorList_16, lstm/while/EmptyTensorList_17,
 lstm/while/EmptyTensorList_18, lstm/while/EmptyTensorList_19, lstm/while/EmptyTensorList_20, lstm/while/EmptyTensorList_21, lstm/while/EmptyTensorList_22, lstm/while/EmptyTenso
rList_23, lstm/while/EmptyTensorList_24, lstm/while/EmptyTensorList_25, lstm/while/EmptyTensorList_26, lstm/while/EmptyTensorList_27, lstm/while/EmptyTensorList_28, lstm/while/E
mptyTensorList_29, lstm/while/EmptyTensorList_30, lstm/while/EmptyTensorList_31, lstm/while/EmptyTensorList_32, lstm/while/EmptyTensorList_33)}}' was changed by setting attribut
e after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session
.
200/200 [==============================] - 2s 10ms/step - loss: 5.8431
Epoch 2/5
200/200 [==============================] - 2s 8ms/step - loss: 4.6052
Epoch 3/5
200/200 [==============================] - 1s 7ms/step - loss: 4.5750
Epoch 4/5
200/200 [==============================] - 2s 8ms/step - loss: 4.5366
Epoch 5/5
200/200 [==============================] - 2s 8ms/step - loss: 4.4898
Time to train: 0min8sec

该模型似乎仍在工作,但使用更大的模型,tensorflow发出警告会花费很长时间(大约10分钟),这很烦人.

The model seems to be still working but with a bigger model, it takes long time for tensorflow to raise the warning (around 10 minutes) which is pretty annoying.

如何解决此问题?

推荐答案

我在将代码从TensorFlow 1.15升级到2.0时遇到了类似的性能问题.不幸的是,我使用的是fit_generator()的bug:如果启用了eager模式而不是编译图形,它将立即执行所有操作.我将其报告为#35513 ,有人回答说,自fit_generator()起已弃用TF 2.1,人们应该改用fit().但是我还没有设法将fit()与生成器一起使用,但这可能是我自己的错误,尽管我不确定这是否已经在TF 2.0中起作用.无论如何,这可能是为什么您看到启用了急切模式的缓慢训练以及为什么禁用它有助于加快速度的原因. (顺便说一下,这个问题还会导致疯狂的GPU内存使用.)

I ran into similar performance issues while upgrading my code from TensorFlow 1.15 to 2.0. I was using fit_generator() which is unfortunately buggy: It literally executes everything eagerly if eager mode is enabled instead of compiling a graph. I reported this as #35513 to which someone replied that fit_generator() is deprecated as of TF 2.1 and people should use fit() instead. However I didn't manage to use fit() with a generator yet, but that might be my own bug, though I'm not sure whether that's already supposed to work in TF 2.0. In any case, this is likely why you see slow training with eager mode enabled and why disabling it helps to speed things up. (And by the way, this issue also causes insane GPU memory usages.)

但是由于我报告为#35501 的另一个错误,TF 2.0将失败在禁用急切模式时使用LSTM和GRU层的cuDNN实现,这再次导致训练速度比我从TF 1.15开始所用的慢.如果您拥有Nvidia设备,则绝对希望使用cuDNN,因为它比常规实现要快得多.

However due to another bug that I reported as #35501 TF 2.0 will fail to use the cuDNN implementations of LSTM and GRU layers when eager mode is disabled, which again causes slower training than what I was used to from TF 1.15. If you have an Nvidia device, you definitely want cuDNN to be used, because it's a lot faster than regular implementations.

如果想要最大的训练速度,则可以将TF 2.0与fit_generator()一起使用,并启用急切模式(以获得cuDNN的好处),并使用model.compile(..., experimental_run_tf_function=False)退回到旧的训练功能(如果model._experimental_run_tf_function = False加载模型).然后在可用时迅速升级到TF 2.1. 2.1版的候选版本已经可用.

If you want maximum training speed, you could use TF 2.0 with fit_generator() and leave eager mode enabled (to get the cuDNN benefits) and use model.compile(..., experimental_run_tf_function=False) to fall back to the old training function (or model._experimental_run_tf_function = False if loading a model). And then quickly upgrade to TF 2.1 as soon as it becomes available. A release candidate is already available for 2.1.

修改:#35501由于无效而被关闭.显然,您不能将任何cuDNN禁用为紧急模式".这对我来说意义不大,但我可以忍受.从长远来看,您希望无论如何都要使用TF,并且启用了eager模式.

#35501 was closed as invalid. Apparently you can't have any cuDNN with eager mode disabled. This makes very little sense to me, but I can live with it. In the long term you want to use TF in the way it's intended to be used anyway, which is with eager mode enabled.

这篇关于TF 2.0 W操作已更改...禁用快速模式并使用回调时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆