为什么自定义训练循环的平均损失不超过batch_size? [英] Why doesn't custom training loop average loss over batch_size?

查看:78
本文介绍了为什么自定义训练循环的平均损失不超过batch_size?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下代码段是Tensorflow官方教程的自定义训练循环.https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch.另一个教程也没有将平均损失超过 batch_size ,如下所示 https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough

Below code snippet is the custom training loop from Tensorflow official tutorial.https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch . Another tutorial also does not average loss over batch_size, as shown here https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough

为什么在此行 loss_value = loss_fn(y_batch_train,logits)上,不对batch_size进行平均loss_value?这是一个错误吗?从这里的另一个问题中,损失函数只能与reduce_mean一起使用,而不与reduce_sum一起使用,实际上需要 reduce_mean 来平均batch_size上的损失

Why is the loss_value not averaged over batch_size at this line loss_value = loss_fn(y_batch_train, logits)? Is this a bug? From another question here Loss function works with reduce_mean but not reduce_sum, reduce_mean is indeed needed to average loss over batch_size

loss_fn 在本教程中定义如下.显然,它不会超过batch_size的平均值.

The loss_fn is defined in the tutorial as below. It obviously does not average over batch_size.

loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

从文档中, keras.losses.SparseCategoricalCrossentropy 对批次中的损失进行求和而不求平均值.因此,这实际上是 reduce_sum ,而不是 reduce_mean

From documentation, keras.losses.SparseCategoricalCrossentropy sums loss over the batch without averaging. Thus, this is essentially reduce_sum instead of reduce_mean!

Type of tf.keras.losses.Reduction to apply to loss. Default value is AUTO. AUTO indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to SUM_OVER_BATCH_SIZE.

代码如下所示.

epochs = 2
for epoch in range(epochs):
    print("\nStart of epoch %d" % (epoch,))

    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

        # Open a GradientTape to record the operations run
        # during the forward pass, which enables auto-differentiation.
        with tf.GradientTape() as tape:

            # Run the forward pass of the layer.
            # The operations that the layer applies
            # to its inputs are going to be recorded
            # on the GradientTape.
            logits = model(x_batch_train, training=True)  # Logits for this minibatch

            # Compute the loss value for this minibatch.
            loss_value = loss_fn(y_batch_train, logits)

        # Use the gradient tape to automatically retrieve
        # the gradients of the trainable variables with respect to the loss.
        grads = tape.gradient(loss_value, model.trainable_weights)

        # Run one step of gradient descent by updating
        # the value of the variables to minimize the loss.
        optimizer.apply_gradients(zip(grads, model.trainable_weights))

        # Log every 200 batches.
        if step % 200 == 0:
            print(
                "Training loss (for one batch) at step %d: %.4f"
                % (step, float(loss_value))
            )
            print("Seen so far: %s samples" % ((step + 1) * 64))

推荐答案

我已经弄清楚了, loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits = True)实际上平均将整个batch_size的损失计算为默认.

I've figured it out, the loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True) indeed averages loss over batch_size by default.

这篇关于为什么自定义训练循环的平均损失不超过batch_size?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆