是否可以基于批次标签(y_true)分布来更新每个批次的学习率? [英] Is it possible to update the learning rate, each batch, based on batch label (y_true) distribution?

查看:75
本文介绍了是否可以基于批次标签(y_true)分布来更新每个批次的学习率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有关此解决方案,请参见此问题的结尾

TL; DR:我需要找到一种方法来计算每个批次的标签分布,并更新学习率.有没有办法访问当前模型的优化器来更新每批的learning_rate?

TL;DR: I need to find a way to calculate the label distribution per-batch, and update the learning rate. Is there a way to access the optimizer of the current model to update the learning_rate, per batch?

下面是如何计算标签分布的方法.可以在损失函数中完成,因为默认情况下,损失是按批次计算的.可以在哪里执行该代码,该代码也可以访问模型的优化器?

Below is how to calculate the label distribution. It can be done in the loss function, as by default the loss is calculated batch-wise. Where can this code be executed which also has access to the model's optimizer?

def loss(y_true, y_pred):
    y = math_ops.argmax(y_true, axis=1)
    freqs = tf.gather(lf, y)  # equal to lf[y] if `lf` and `y` were numpy array's
    inv_freqs = math_ops.pow(freqs, -1)
    E = 1 / math_ops.reduce_sum(inv_freqs)  # value to use when updating learning rate


其他详细信息

本文所述,为了实施学习率计划,我相信我需要通过根据批次中的真实标签 (通常在keras/tensorflow中表示为y_true)的标签分布计算出的值来更新每个批次在训练过程中学习率的方法


Further Details

In order to implement a learning rate schedule, as described in this paper, I believe I need a way to update the learning rate during training, each batch, by a value calcuated from the label distribution of the true labels in the batch (y_true as it's typically denoted in keras/tensorflow)

哪里...

x 模型的输出

y 相应的地面真相标签

Β m 个样本(例如64 )的小批量

Β the minibatch of m samples (e.g. 64)

n y 地面真理标签y的整个训练样本量

ny the entire training sample size for ground truth label y

n y -1 逆标签频率

ny-1 the inverse label frequency

我关注的公式部分是α和Δθ

The portion of the formula I'm focused on is the part between α and Δθ

我可以通过自定义损失函数轻松实现这一目标,但是我不知道如何提高损失函数的学习率,即使可以的话.

I can achieve this with ease from within a custom loss function, but I do not know how to upadte the learning rate--if you even can--from the loss function.

def loss(y_true, y_pred):
    y = math_ops.argmax(y_true, axis=1)
    freqs = tf.gather(lf, y)  # equal to lf[y] if `lf` and `y` were numpy array's
    inv_freqs = math_ops.pow(freqs, -1)
    E = 1 / math_ops.reduce_sum(inv_freqs)  # value to use when updating learning rate

哪里...

lf 每个类别的采样频率.例如2类,c0 = 10个示例,c1 = 100-> lf == [10, 100]

lf the sample frequencies for each class. e.g. 2 classes, c0 = 10 examples, c1 = 100 --> lf == [10, 100]

是否可以通过一些花哨的方式来更新优化器的学习率,例如可以通过CallBack进行的操作?

Is there some fancy way I can update the optimizers learning rate, like what can be done from a CallBack?

def on_batch_begin(self, batch, log):
    # note: batch is just an incremented value to indicate batch index
    self.model.optimizer.lr  # learning rate, can be modified from callback

在此先感谢您的帮助!

非常感谢@mrk向正确的方向推动我解决这个问题!

为了计算每个批次的标签分布,然后使用该值更新优化器的学习率,必须...

In order to compute the per-batch label distributions, then use that value to update the optimizer's learning rate, one must ...

  1. 创建一个自定义指标,计算标签分配,每批并返回频率数组(默认情况下,keras是逐批优化的,因此每批都会计算指标).
  2. 通过将keras.callbacks.History类子类化,创建一个典型的学习率调度程序
  3. 覆盖调度程序的on_batch_end功能,logs dict将保留批次的所有计算量度,包括我们的自定义标签分配量度!
  1. Create a custom Metric which computes the label distribution, per-batch, and returns the frequency array (by default keras is optimized batch-wise, hence metrics are calcuated each batch).
  2. Create a typical learning rate scheduler, by subclassing the keras.callbacks.History class
  3. Override the on_batch_end function of the scheduler, the logs dict will ontain all computed metrics for the batch including our custom label distribution metric!

创建自定义指标

class LabelDistribution(tf.keras.metrics.Metric):
    """
    Computes the per-batch label distribution (y_true) and stores the array as
    a metric which can be accessed via keras CallBack's

    :param n_class: int - number of distinct output class(es)
    """

    def __init__(self, n_class, name='batch_label_distribution', **kwargs):
        super(LabelDistribution, self).__init__(name=name, **kwargs)
        self.n_class = n_class
        self.label_distribution = self.add_weight(name='ld', initializer='zeros',
                                                  aggregation=VariableAggregation.NONE,
                                                  shape=(self.n_class, ))

    def update_state(self, y_true, y_pred, sample_weight=None):
        y_true = mo.cast(y_true, 'int32')
        y = mo.argmax(y_true, axis=1)
        label_distrib = mo.bincount(mo.cast(y, 'int32'))

        self.label_distribution.assign(mo.cast(label_distrib, 'float32'))

    def result(self):
        return self.label_distribution

    def reset_states(self):
        self.label_distribution.assign([0]*self.n_class)

创建DRW学习费率计划程序

class DRWLearningRateSchedule(keras.callbacks.History):
    """
    Used to implement the Differed Re-weighting strategy from
    [Kaidi Cao, et al. "Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss." (2019)]
    (https://arxiv.org/abs/1906.07413)

    To be included as a metric to model.compile
    `model.compile(..., metrics=[DRWLearningRateSchedule(.01)])`
    """

    def __init__(self, base_lr, ld_metric='batch_label_distribution'):
        super(DRWLearningRateSchedule, self).__init__()

        self.base_lr = base_lr
        self.ld_metric = ld_metric  # name of the LabelDistribution metric

    def on_batch_end(self, batch, logs=None):
        ld = logs.get(self.ld_metric)  # the per-batch label distribution
        current_lr = self.model.optimizer.lr
        # example below of updating the optimizers learning rate
        K.set_value(self.model.optimizer.lr, current_lr * (1 / math_ops.reduce_sum(ld)))

推荐答案

基于Keras损失的学习率调整

经过一番研究,我发现,除了触发衰减之外,您还可以为学习率定义另一个函数或值.

After some research I found this, instead of triggering a decay you could as well define another function or value to your learning rate.

from __future__ import absolute_import
from __future__ import print_function

import keras
from keras import backend as K
import numpy as np


class LossLearningRateScheduler(keras.callbacks.History):
    """
    A learning rate scheduler that relies on changes in loss function
    value to dictate whether learning rate is decayed or not.
    LossLearningRateScheduler has the following properties:
    base_lr: the starting learning rate
    lookback_epochs: the number of epochs in the past to compare with the loss function at the current epoch to determine if progress is being made.
    decay_threshold / decay_multiple: if loss function has not improved by a factor of decay_threshold * lookback_epochs, then decay_multiple will be applied to the learning rate.
    spike_epochs: list of the epoch numbers where you want to spike the learning rate.
    spike_multiple: the multiple applied to the current learning rate for a spike.
    """

    def __init__(self, base_lr, lookback_epochs, spike_epochs = None, spike_multiple = 10, decay_threshold = 0.002, decay_multiple = 0.5, loss_type = 'val_loss'):

        super(LossLearningRateScheduler, self).__init__()

        self.base_lr = base_lr
        self.lookback_epochs = lookback_epochs
        self.spike_epochs = spike_epochs
        self.spike_multiple = spike_multiple
        self.decay_threshold = decay_threshold
        self.decay_multiple = decay_multiple
        self.loss_type = loss_type


    def on_epoch_begin(self, epoch, logs=None):

        if len(self.epoch) > self.lookback_epochs:

            current_lr = K.get_value(self.model.optimizer.lr)

            target_loss = self.history[self.loss_type] 

            loss_diff =  target_loss[-int(self.lookback_epochs)] - target_loss[-1]

            if loss_diff <= np.abs(target_loss[-1]) * (self.decay_threshold * self.lookback_epochs):

                print(' '.join(('Changing learning rate from', str(current_lr), 'to', str(current_lr * self.decay_multiple))))
                K.set_value(self.model.optimizer.lr, current_lr * self.decay_multiple)
                current_lr = current_lr * self.decay_multiple

            else:

                print(' '.join(('Learning rate:', str(current_lr))))

            if self.spike_epochs is not None and len(self.epoch) in self.spike_epochs:
                print(' '.join(('Spiking learning rate from', str(current_lr), 'to', str(current_lr * self.spike_multiple))))
                K.set_value(self.model.optimizer.lr, current_lr * self.spike_multiple)

        else:

            print(' '.join(('Setting learning rate to', str(self.base_lr))))
            K.set_value(self.model.optimizer.lr, self.base_lr)


        return K.get_value(self.model.optimizer.lr)




def main():
    return

if __name__ == '__main__':
    main()


这篇关于是否可以基于批次标签(y_true)分布来更新每个批次的学习率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆