如何在 tensorflow 2.0 中累积梯度? [英] How to accumulate gradients in tensorflow 2.0?

查看:60
本文介绍了如何在 tensorflow 2.0 中累积梯度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 tensorflow 2.0 训练模型.我的训练集中的图像具有不同的分辨率.我构建的模型可以处理可变分辨率(转换层,然后是全局平均).我的训练集非常小,我想在一个批次中使用完整的训练集.

I'm training a model with tensorflow 2.0. The images in my training set are of different resolutions. The Model I've built can handle variable resolutions (conv layers followed by global averaging). My training set is very small and I want to use full training set in a single batch.

由于我的图像具有不同的分辨率,我无法使用 model.fit().因此,我计划将每个样本单独通过网络,累积错误/梯度,然后应用一个优化器步骤.我能够计算损失值,但我不知道如何累积损失/梯度.如何累积损失/梯度,然后应用单个优化器步骤?

Since my images are of different resolutions, I can't use model.fit(). So, I'm planning to pass each sample through the network individually, accumulate the errors/gradients and then apply one optimizer step. I'm able to compute loss values, but I don't know how to accumulate the losses/gradients. How can I accumulate the losses/gradients and then apply a single optimizer step?

代码:

for i in range(num_epochs):
    print(f'Epoch: {i + 1}')
    total_loss = 0
    for j in tqdm(range(num_samples)):
        sample = samples[j]
        with tf.GradientTape as tape:
            prediction = self.model(sample)
            loss_value = self.loss_function(y_true=labels[j], y_pred=prediction)
        gradients = tape.gradient(loss_value, self.model.trainable_variables)
        self.optimizer.apply_gradients(zip(gradients, self.model.trainable_variables))
        total_loss += loss_value

    epoch_loss = total_loss / num_samples
    print(f'Epoch loss: {epoch_loss}')

推荐答案

如果我从这句话中理解正确:

If I understand correctly from this statement:

如何累积损失/梯度,然后应用单个优化器步骤?

How can I accumulate the losses/gradients and then apply a single optimizer step?

@Nagabhushan 正在尝试累积梯度,然后对(平均)累积梯度应用优化.@TensorflowSupport 提供的答案没有回答.为了只执行一次优化,并从多个磁带中累积梯度,您可以执行以下操作:

@Nagabhushan is trying to accumulate gradients and then apply the optimization on the (mean) accumulated gradient. The answer provided by @TensorflowSupport does not answers it. In order to perform the optimization only once, and accumulate the gradient from several tapes, you can do the following:

for i in range(num_epochs):
    print(f'Epoch: {i + 1}')
    total_loss = 0

    # get trainable variables
    train_vars = self.model.trainable_variables
    # Create empty gradient list (not a tf.Variable list)
    accum_gradient = [tf.zeros_like(this_var) for this_var in train_vars]

    for j in tqdm(range(num_samples)):
        sample = samples[j]
        with tf.GradientTape as tape:
            prediction = self.model(sample)
            loss_value = self.loss_function(y_true=labels[j], y_pred=prediction)
        total_loss += loss_value

        # get gradients of this tape
        gradients = tape.gradient(loss_value, train_vars)
        # Accumulate the gradients
        accum_gradient = [(acum_grad+grad) for acum_grad, grad in zip(accum_gradient, gradients)]


    # Now, after executing all the tapes you needed, we apply the optimization step
    # (but first we take the average of the gradients)
    accum_gradient = [this_grad/num_samples for this_grad in accum_gradient]
    # apply optimization step
    self.optimizer.apply_gradients(zip(accum_gradient,train_vars))
        

    epoch_loss = total_loss / num_samples
    print(f'Epoch loss: {epoch_loss}')

在训练循环中应避免使用 tf.Variable(),因为在尝试将代码作为图形执行时会产生错误.如果你在你的训练函数中使用 tf.Variable() 然后用@tf.function"装饰它或应用tf.function(my_train_fcn)";为了获得图形函数(即为了提高性能),执行会引发错误.发生这种情况是因为对 tf.Variable 函数的跟踪导致了与在急切执行中观察到的行为(分别为重新利用或创建)不同的行为.您可以在 tensorflow 帮助页面中找到更多相关信息.

Using tf.Variable() should be avoided inside the training loop, since it will produce errors when trying to execute the code as a graph. If you use tf.Variable() inside your training function and then decorate it with "@tf.function" or apply "tf.function(my_train_fcn)" to obtain a graph function (i.e. for improved performance), the execution will rise an error. This happens because the tracing of the tf.Variable function results in a different behaviour than the observed in eager execution (re-utilization or creation, respectively). You can find more info on this in the tensorflow help page.

这篇关于如何在 tensorflow 2.0 中累积梯度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆