ValueError:没有为任何变量提供渐变-Tensorflow 2.0/Keras [英] ValueError: No gradients provided for any variable - Tensorflow 2.0/Keras

查看:71
本文介绍了ValueError:没有为任何变量提供渐变-Tensorflow 2.0/Keras的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Keras实现一个简单的序列到序列模型.但是,我仍然看到以下ValueError:

I am trying to implement a simple sequence-to-sequence model using Keras. However, I keep seeing the following ValueError:

ValueError: No gradients provided for any variable: ['simple_model/time_distributed/kernel:0', 'simple_model/time_distributed/bias:0', 'simple_model/embedding/embeddings:0', 'simple_model/conv2d/kernel:0', 'simple_model/conv2d/bias:0', 'simple_model/dense_1/kernel:0', 'simple_model/dense_1/bias:0'].

其他问题,例如或查看此问题表明,这可能与交叉熵损失函数有关;但是我看不到我在做什么错.

Other questions like this or looking at this issue on Github suggests that this might have something to do with the cross-entropy loss function; but I fail to see what I am doing wrong here.

我不认为这是问题所在,但我想提一提,我是每晚构建的TensorFlow,准确地说,是tf-nightly==2.2.0.dev20200410.

I do not think that this is the problem, but I want to mention that I am on a nightly build of TensorFlow, tf-nightly==2.2.0.dev20200410 to be precise.

以下代码是一个独立的示例,应从上面重现异常:

This following code is a standalone example and should reproduce the exception from above:

import random
from functools import partial

import tensorflow as tf
from tensorflow import keras
from tensorflow_datasets.core.features.text import SubwordTextEncoder

EOS = '<eos>'
PAD = '<pad>'

RESERVED_TOKENS = [EOS, PAD]
EOS_ID = RESERVED_TOKENS.index(EOS)
PAD_ID = RESERVED_TOKENS.index(PAD)

dictionary = [
    'verstehen',
    'verstanden',
    'vergessen',
    'verlegen',
    'verlernen',
    'vertun',
    'vertan',
    'verloren',
    'verlieren',
    'verlassen',
    'verhandeln',
]

dictionary = [word.lower() for word in dictionary]


class SimpleModel(keras.models.Model):

    def __init__(self, params, *args, **kwargs):
        super().__init__(*args, **kwargs)

        self.params = params
        self.out_layer = keras.layers.Dense(1, activation='softmax')

        self.model_layers = [
            keras.layers.Embedding(params['vocab_size'], params['vocab_size']),
            keras.layers.Lambda(lambda l: tf.expand_dims(l, -1)),
            keras.layers.Conv2D(1, 4),
            keras.layers.MaxPooling2D(1),
            keras.layers.Dense(1, activation='relu'),
            keras.layers.TimeDistributed(self.out_layer)
        ]

    def call(self, example, training=None, mask=None):
        x = example['inputs']
        for layer in self.model_layers:
            x = layer(x)
        return x


def sample_generator(text_encoder: SubwordTextEncoder, max_sample: int = None):
    count = 0

    while True:
        random.shuffle(dictionary)

        for word in dictionary:

            for i in range(1, len(word)):

                inputs = word[:i]
                targets = word

                example = dict(
                    inputs=text_encoder.encode(inputs) + [EOS_ID],
                    targets=text_encoder.encode(targets) + [EOS_ID],
                )
                count += 1

                yield example

                if max_sample is not None and count >= max_sample:
                    print('Reached max_samples (%d)' % max_sample)
                    return


def make_dataset(generator_fn, params, training):

    dataset = tf.data.Dataset.from_generator(
        generator_fn,
        output_types={
            'inputs': tf.int64,
            'targets': tf.int64,
        }
    ).padded_batch(
        params['batch_size'],
        padded_shapes={
            'inputs': (None,),
            'targets': (None,)
        },
    )

    if training:
        dataset = dataset.map(partial(prepare_example, params=params)).repeat()

    return dataset


def prepare_example(example: dict, params: dict):
    # Make sure targets are one-hot encoded
    example['targets'] = tf.one_hot(example['targets'], depth=params['vocab_size'])
    return example


def main():

    text_encoder = SubwordTextEncoder.build_from_corpus(
        iter(dictionary),
        target_vocab_size=1000,
        max_subword_length=6,
        reserved_tokens=RESERVED_TOKENS
    )

    generator_fn = partial(sample_generator, text_encoder=text_encoder, max_sample=10)

    params = dict(
        batch_size=20,
        vocab_size=text_encoder.vocab_size,
        hidden_size=32,
        max_input_length=30,
        max_target_length=30
    )

    model = SimpleModel(params)

    model.compile(
        optimizer='adam',
        loss='categorical_crossentropy',
    )

    train_dataset = make_dataset(generator_fn, params, training=True)
    dev_dataset = make_dataset(generator_fn, params, training=False)

    # Peek data
    for train_batch, dev_batch in zip(train_dataset, dev_dataset):
        print(train_batch)
        print(dev_batch)
        break

    model.fit(
        train_dataset,
        epochs=1000,
        steps_per_epoch=100,
        validation_data=dev_dataset,
        validation_steps=100,
    )


if __name__ == '__main__':
    main()

更新

  • 要点链接
  • Github问题链接
  • Update

    • Gist link
    • Github issue link
    • 推荐答案

      您的代码中存在两套不同的问题,可分为语法问题和体系结构问题.引发的错误(即No gradients provided for any variable)与语法问题有关,我将在下面主要解决该问题,但是在此之后,我也会尝试为您提供有关体系结构问题的一些提示.

      There are two different sets of problems in your code, which could be categorized as syntactical and architectural problems. The error raised (i.e. No gradients provided for any variable) is related to the syntactical problems which I would mostly address below, but I would try to give you some pointers about the architectural problems after that as well.

      语法问题的主要原因是关于为模型使用命名的输入和输出.当模型具有多个输入和/或输出层时,用Keras命名的输入和输出最有用.但是,您的模型只有一个输入层和一个输出层.因此,在这里使用命名的输入和输出可能不是很有用,但是如果您决定,那么我将解释如何正确完成它.

      The main cause of syntactical problems is about using named inputs and outputs for the model. Named inputs and outputs in Keras is mostly useful when the model has multiple input and/or output layers. However, your model has only one input and one output layer. Therefore, it may not be very useful to use named inputs and outputs here, but if that's your decision I would explain how it could be done properly.

      首先,请记住,使用Keras模型时,从任何输入管道生成的数据(无论是Python生成器还是tf.data.Dataset)都应以元组的形式提供,即(input_batch, output_batch)(input_batch, output_batch, sample_weights) .而且,正如我说的那样,这是Keras处理输入管道时所期望的格式,即使我们使用命名的输入和输出作为字典时也是如此.

      First of all, you should keep in mind that when using Keras models, the data generated from any input pipeline (whether it's a Python generator or tf.data.Dataset) should be provided as a tuple i.e. (input_batch, output_batch) or (input_batch, output_batch, sample_weights). And, as I said, this is the expected format everywhere in Keras when dealing with input pipelines, even when we are using named inputs and outputs as dictionaries.

      例如,如果我想使用输入/输出命名,并且我的模型有两个名为"words"和重要性"的输入层,还有两个名为"output1"和"output2"的输出层,则它们应为格式如下:

      For example, if I want to use inputs/outputs naming and my model has two input layers named as "words" and "importance", and also two output layers named as "output1" and "output2", they should be formatted like this:

      ({'words': words_data, 'importance': importance_data},
       {'output1': output1_data, 'output2': output2_data})
      

      因此,如您在上面所看到的,它是一个元组,其中元组的每个元素都是一个字典;第一个元素对应于模型的输入,第二个元素对应于模型的输出.现在,根据这一点,让我们看看应该对您的代码进行哪些修改:

      So as you can see above, it's a tuple where each element of the tuple is a dictionary; the first element corresponds to inputs of the model and the second element corresponds to outputs of the model. Now, according to this point, let's see what modifications should be done to your code:

      • sample_generator中,我们应该返回一个字典元组,而不是一个字典元组.所以:

      • In sample_generator we should return a tuple of dicts, not a dict. So:

      example = tuple([
           {'inputs': text_encoder.encode(inputs) + [EOS_ID]},
           {'targets': text_encoder.encode(targets) + [EOS_ID]},
      ])
      

    • make_dataset函数中,tf.data.Dataset的输入参数应遵守此规定:

    • In make_dataset function, the input arguments of tf.data.Dataset should respect this:

      output_types=(
          {'inputs': tf.int64},
          {'targets': tf.int64}
      )
      
      padded_shapes=(
          {'inputs': (None,)},
          {'targets': (None,)}
      )
      

    • prepare_example的签名及其主体也应被修改:

    • The signature of prepare_example and its body should be modified as well:

      def prepare_example(ex_inputs: dict, ex_outputs: dict, params: dict):
          # Make sure targets are one-hot encoded
          ex_outputs['targets'] = tf.one_hot(ex_outputs['targets'], depth=params['vocab_size'])
          return ex_inputs, ex_outputs
      

    • 最后,是子类模型的call方法:

      return {'targets': x}
      

    • 还有另一件事:构造图层时,我们还应该使用name参数将这些名称放在相应的输入和输出图层上(例如Dense(..., name='output');但是,因为我们使用的是Model子类化在这里定义我们的模型,这是没有必要的.

    • And one more thing: we should also put these names on the corresponding input and output layers using the name argument when constructing the layers (like Dense(..., name='output'); however, since we are using the Model sub-classing here to define our model, that's not necessary to do.

      好的,这些将解决输入/输出问题,并且与梯度有关的错误将消失.但是,如果在应用上述修改后运行代码,则仍然会出现有关不兼容形状的错误.如前所述,您的模型中存在一些架构性问题,我将在下面简要介绍.

      All right, these would resolve the input/output problems and the error related to gradients would be gone; however, if you run the code after applying the above modifications, you would still get an error regarding incompatible shapes. As I said earlier, there are architectural issues in your model which I would briefly address below.

      正如您提到的,这应该是一个从序列到序列的模型.因此,输出是单热编码矢量的序列,其中每个矢量的长度等于(目标序列)词汇量.结果,softmax分类器应具有与词汇表大小一样多的单位(注意:注意:在任何模型或问题中,切勿使用只有一个单位的softmax层;这完全是错误的!请思考为什么是错误的!):

      As you mentioned, this is supposed to be a seq-to-seq model. Therefore, the output is a sequence of one-hot encoded vectors, where the length of each vector is equal to (target sequences) vocabulary size. As a result, the softmax classifier should have as much units as vocabulary size, like this (Note: never in any model or problem use a softmax layer with only one unit; that's all wrong! Think about why it's wrong!):

      self.out_layer = keras.layers.Dense(params['vocab_size'], activation='softmax')
      

      接下来要考虑的事实是,我们正在处理一维序列(即标记/单词序列).因此,在这里使用2D卷积和2D池化层是没有意义的.您可以使用它们的一维对应物,也可以将它们替换为RNN层之类的东西.结果,也应除去Lambda层.另外,如果要使用卷积和池化,则应适当调整每一层中的过滤器数量以及池大小(例如,一个conv过滤器,Conv1D(1,...)可能不是最佳选择,并且池大小为1不能满足要求)感.)

      The next thing to consider is the fact that we are dealing with 1D sequences (i.e. a sequence of tokens/words). Therefore using 2D-convolution and 2D-pooling layers does not make sense here. You can either use their 1D counterparts or replace them with something else like RNN layers. As a result of this, the Lambda layer should be removed as well. Also, if you want to use convolution and pooling, you should adjust the number of filters in each layer as well as the pool size properly (i.e. one conv filter, Conv1D(1,...) is not probably optimal, and pool size of 1 does not make sense).

      此外,最后一层之前只有一个单元的Dense层可能会严重限制模型的表示能力(即,本质上是模型的瓶颈).要么增加其单位数量,要么将其删除.

      Further, that Dense layer before the last layer which has only one unit could severely limit the representational capacity of the model (i.e. it is essentially the bottleneck of your model). Either increase its number of units, or remove it.

      另一件事是,没有理由不对开发集的标签进行一次热编码.相反,它们应该像训练集的标签一样被一键编码.因此,应该完全删除make_generatortraining参数,或者,如果您有其他用例,则应使用传递给make_dataset函数的training=True参数来创建dev数据集.

      The other thing is that there is no reason for not one-hot encoding the labels of dev set. Rather, they should be one-hot encoded like the labels of training set. Therefore, either the training argument of make_generator should be removed entirely or, if you have some other use case for it, the dev dataset should be created with training=True argument passed to make_dataset function.

      最后,在完成所有这些更改之后,您的模型可能会开始工作并开始拟合数据.但经过几批处理后,您可能会再次遇到形状不兼容的错误.这是因为您正在生成尺寸未知的输入数据,并且还使用轻松的填充方法来填充所需的每个批次(即对于padded_shapes使用填充(c21>)).要解决此问题,您应该确定固定的输入/输出尺寸(例如,通过考虑输入/输出序列的固定长度),然后调整模型的体系结构或超参数(例如,转换内核大小,转换填充,池大小) ,添加更多层等)以及相应的padded_shapes参数.即使您希望模型支持可变长度的输入/输出序列,也应该在模型的体系结构和超参数以及padded_shapes参数中考虑它.由于此解决方案取决于您脑海中的任务和所需的设计,并且没有万能的解决方案,因此,我不会对此做进一步评论,而是由您自己解决.但这是一个可行的解决方案(可能根本不是,也可能不是最优),只是为了给您一个想法:

      Finally, after all these changes your model might work and start fitting on data; but after a few batches passed, you might get incompatible shapes error again. That's because you are generating input data with unknown dimension and also use a relaxed padding approach to pad each batch as much as needed (i.e. by using (None,) for padded_shapes). To resolve this you should decide on a fixed input/output dimension (e.g. by considering a fixed length for input/output sequences), and then adjust the architecture or hyper-parameters of the model (e.g. conv kernel size, conv padding, pooling size, adding more layers, etc.) as well as the padded_shapes argument accordingly. Even if you would like your model to support input/output sequences of variable length instead, then you should consider it in model's architecture and hyper-parameters and also the padded_shapes argument. Since this the solution depends on the task and desired design in your mind and there is no one-fits-all solutions, I would not comment further on that and leave it to you to figure it out. But here is a working solution (which may not be, and probably isn't, optimal at all) just to give you an idea:

      self.out_layer = keras.layers.Dense(params['vocab_size'], activation='softmax')
      
      self.model_layers = [
          keras.layers.Embedding(params['vocab_size'], params['vocab_size']),
          keras.layers.Conv1D(32, 4, padding='same'),
          keras.layers.TimeDistributed(self.out_layer)
      ]
      
      
      # ...
      padded_shapes=(
          {'inputs': (10,)},
          {'targets': (10,)}
      )
      

      这篇关于ValueError:没有为任何变量提供渐变-Tensorflow 2.0/Keras的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆