为什么自定义训练循环的平均损失不超过batch_size? [英] Why doesn't custom training loop average loss over batch_size?

查看：78 发布时间：2021/4/29 20:48:16 tensorflow machine-learning deep-learning

本文介绍了为什么自定义训练循环的平均损失不超过batch_size?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

以下代码段是Tensorflow官方教程的自定义训练循环.https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch.另一个教程也没有将平均损失超过 batch_size ，如下所示 https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough

Below code snippet is the custom training loop from Tensorflow official tutorial.https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch . Another tutorial also does not average loss over batch_size, as shown here https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough

为什么在此行 loss_value = loss_fn(y_batch_train，logits)上，不对batch_size进行平均loss_value?这是一个错误吗?从这里的另一个问题中，损失函数只能与reduce_mean一起使用，而不与reduce_sum一起使用，实际上需要 reduce_mean 来平均batch_size上的损失

Why is the loss_value not averaged over batch_size at this line loss_value = loss_fn(y_batch_train, logits)? Is this a bug? From another question here Loss function works with reduce_mean but not reduce_sum, reduce_mean is indeed needed to average loss over batch_size

loss_fn 在本教程中定义如下.显然，它不会超过batch_size的平均值.

The loss_fn is defined in the tutorial as below. It obviously does not average over batch_size.

loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

从文档中， keras.losses.SparseCategoricalCrossentropy 对批次中的损失进行求和而不求平均值.因此，这实际上是 reduce_sum ，而不是 reduce_mean ！

From documentation, keras.losses.SparseCategoricalCrossentropy sums loss over the batch without averaging. Thus, this is essentially reduce_sum instead of reduce_mean!

Type of tf.keras.losses.Reduction to apply to loss. Default value is AUTO. AUTO indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to SUM_OVER_BATCH_SIZE.

代码如下所示.

epochs = 2
for epoch in range(epochs):
    print("\nStart of epoch %d" % (epoch,))

    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

        # Open a GradientTape to record the operations run
        # during the forward pass, which enables auto-differentiation.
        with tf.GradientTape() as tape:

            # Run the forward pass of the layer.
            # The operations that the layer applies
            # to its inputs are going to be recorded
            # on the GradientTape.
            logits = model(x_batch_train, training=True)  # Logits for this minibatch

            # Compute the loss value for this minibatch.
            loss_value = loss_fn(y_batch_train, logits)

        # Use the gradient tape to automatically retrieve
        # the gradients of the trainable variables with respect to the loss.
        grads = tape.gradient(loss_value, model.trainable_weights)

        # Run one step of gradient descent by updating
        # the value of the variables to minimize the loss.
        optimizer.apply_gradients(zip(grads, model.trainable_weights))

        # Log every 200 batches.
        if step % 200 == 0:
            print(
                "Training loss (for one batch) at step %d: %.4f"
                % (step, float(loss_value))
            )
            print("Seen so far: %s samples" % ((step + 1) * 64))

为什么自定义训练循环的平均损失不超过batch_size? [英] Why doesn't custom training loop average loss over batch_size?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

为什么自定义训练循环的平均损失不超过batch_size? [英] Why doesn&#39;t custom training loop average loss over batch_size?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

为什么自定义训练循环的平均损失不超过batch_size? [英] Why doesn't custom training loop average loss over batch_size?

登录关闭