keras 中不同批次大小的损失计算 [英] loss calculation over different batch sizes in keras

查看:27
本文介绍了keras 中不同批次大小的损失计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道从理论上讲,一个批次的网络损失只是所有单个损失的总和.这反映在 Keras 代码中用于计算总损失.相关:

I know that in theory, the loss of a network over a batch is just the sum of all the individual losses. This is reflected in the Keras code for calculating total loss. Relevantly:

            for i in range(len(self.outputs)):
            if i in skip_target_indices:
                continue
            y_true = self.targets[i]
            y_pred = self.outputs[i]
            weighted_loss = weighted_losses[i]
            sample_weight = sample_weights[i]
            mask = masks[i]
            loss_weight = loss_weights_list[i]
            with K.name_scope(self.output_names[i] + '_loss'):
                output_loss = weighted_loss(y_true, y_pred,
                                            sample_weight, mask)
            if len(self.outputs) > 1:
                self.metrics_tensors.append(output_loss)
                self.metrics_names.append(self.output_names[i] + '_loss')
            if total_loss is None:
                total_loss = loss_weight * output_loss
            else:
                total_loss += loss_weight * output_loss

然而,我注意到当我用 batch_size=32batch_size=64 训练一个网络时,每个时期的损失值仍然会出现更多或不一样,只有 ~0.05% 差异.但是,两个网络的准确度保持完全相同.所以基本上,批量大小对网络没有太大影响.

However, I noticed that when I train a network with a batch_size=32 and a batch_size=64, the loss value for every epoch still comes out to more or less the same with only a ~0.05% difference. However, the accuracy for both networks remained the exact same. So essentially, the batch size didn't have too much effect on the network.

我的问题是,当我将批量大小加倍时,假设损失真的被加和了,那么损失实际上不应该是之前值的两倍,或者至少更大吗?网络可能在更大的批量大小下学习得更好的借口被准确度保持完全相同的事实否定了.

My question is when I double the batch size, assuming the loss is really being summed, shouldn't the loss in fact be double the value it was previously, or at least greater? The excuse that the network probably learned better with the bigger batch size is negated by the fact the accuracy has stayed exactly the same.

无论批量大小,损失或多或少保持不变这一事实让我认为它是平均的.

The fact that the loss stays more or less the same regardless of the batch size makes me think it's being averaged.

推荐答案

您发布的代码涉及多输出模型,其中每个输出可能有自己的损失和权重.因此,不同输出层的损失值被加在一起.但是,正如您在 losses.py 文件.例如这是与二进制交叉熵损失相关的代码:

The code you have posted concerns multi-output models where each output may have its own loss and weights. Hence, the loss values of different output layers are summed together. However, The individual losses are averaged over the batch as you can see in the losses.py file. For example this is the code related to binary cross-entropy loss:

def binary_crossentropy(y_true, y_pred):
    return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)

更新:在添加这个答案的第二部分(即损失函数)之后,作为 OP,我被 axis=-1 中的损失函数的定义,我想必须是 axis=0 来表示批次的平均值?!然后我意识到所有用于定义损失函数的 K.mean() 都适用于由多个单元组成的输出层的情况.那么该批次的平均损失在哪里呢?我检查了代码以找到答案:要获取特定损失函数的损失值,调用一个函数,将真实和预测标签以及样本权重和掩码作为输入:

Update: Right after adding the second part of the this answer (i.e. loss functions), as the OP, I was baffled by the axis=-1 in the definition of loss function and I thought to myself that it must be axis=0 to indicate the average over the batch?! Then I realized that all the K.mean() used in the definition of loss function are there for the case of an output layer consisting of multiple units. So where is the loss averaged over the batch? I inspected the code to find the answer: to get the loss value for a specific loss function, a function is called taking the true and predicted labels as well as the sample weights and mask as its inputs:

weighted_loss = weighted_losses[i]
# ...
output_loss = weighted_loss(y_true, y_pred, sample_weight, mask)

这个 weighted_losses[i] 函数是什么?如您所见,它是(增强的)损失函数列表:

what is this weighted_losses[i] function? As you may find, it is an element of list of (augmented) loss functions:

weighted_losses = [
    weighted_masked_objective(fn) for fn in loss_functions]

fn 实际上是 losses.py 文件或者它可能是用户定义的自定义损失函数.现在这个 weighted_masked_objective 函数是什么?它已在 py_utils.">pytraining_utils 中定义一个>文件:

fn is actually one of the loss functions defined in losses.py file or it may be a user-defined custom loss function. And now what is this weighted_masked_objective function? It has been defined in training_utils.py file:

def weighted_masked_objective(fn):
    """Adds support for masking and sample-weighting to an objective function.
    It transforms an objective function `fn(y_true, y_pred)`
    into a sample-weighted, cost-masked objective function
    `fn(y_true, y_pred, weights, mask)`.
    # Arguments
        fn: The objective function to wrap,
            with signature `fn(y_true, y_pred)`.
    # Returns
        A function with signature `fn(y_true, y_pred, weights, mask)`.
    """
    if fn is None:
        return None

    def weighted(y_true, y_pred, weights, mask=None):
        """Wrapper function.
        # Arguments
            y_true: `y_true` argument of `fn`.
            y_pred: `y_pred` argument of `fn`.
            weights: Weights tensor.
            mask: Mask tensor.
        # Returns
            Scalar tensor.
        """
        # score_array has ndim >= 2
        score_array = fn(y_true, y_pred)
        if mask is not None:
            # Cast the mask to floatX to avoid float64 upcasting in Theano
            mask = K.cast(mask, K.floatx())
            # mask should have the same shape as score_array
            score_array *= mask
            #  the loss per batch should be proportional
            #  to the number of unmasked samples.
            score_array /= K.mean(mask)

        # apply sample weighting
        if weights is not None:
            # reduce score_array to same ndim as weight array
            ndim = K.ndim(score_array)
            weight_ndim = K.ndim(weights)
            score_array = K.mean(score_array,
                                 axis=list(range(weight_ndim, ndim)))
            score_array *= weights
            score_array /= K.mean(K.cast(K.not_equal(weights, 0), K.floatx()))
        return K.mean(score_array)
return weighted

如您所见,首先在 score_array = fn(y_true, y_pred) 行中计算每个样本的损失,然后最后返回损失的平均值,即 返回 K.mean(score_array).这证实了报告的损失是每个批次中每个样本损失的平均值.

As you can see, first the per sample loss is computed in the line score_array = fn(y_true, y_pred) and then at the end the average of the losses is returned, i.e. return K.mean(score_array). So that confirms that the reported losses are the average of per sample losses in each batch.

注意 K.mean(),如果使用 Tensorflow 作为后端,调用 tf.reduce_mean() 函数.现在,当 K.mean() 在没有 axis 参数的情况下被调用时(axis 参数的默认值为 None),因为它在 weighted_masked_objective 函数中被调用,对应的调用 tf.reduce_mean() 计算所有轴的平均值并返回一个值.这就是为什么无论输出层的形状和使用的损失函数如何,Keras 都只使用和报告一个损失值(应该是这样,因为优化算法需要最小化标量值,而不是向量或张量).

Note that K.mean(), in case of using Tensorflow as backend, calls the tf.reduce_mean() function. Now, when K.mean() is called without an axis argument (the default value of axis argument would be None), as it is called in weighted_masked_objective function, the corresponding call to tf.reduce_mean() computes the mean over all the axes and returns one single value. That's why no matter the shape of output layer and the loss function used, only one single loss value is used and reported by Keras (and it should be like this, because optimization algorithms need to minimize a scalar value, not a vector or tensor).

这篇关于keras 中不同批次大小的损失计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆