GradientTape根据是否由tf.function装饰的损失函数给出不同的梯度 [英] GradientTape gives different gradients depending on loss function being decorated by tf.function or not

查看:82
本文介绍了GradientTape根据是否由tf.function装饰的损失函数给出不同的梯度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现,计算出的梯度在以下方面取决于tf.function装饰器的相互作用.

I find that the gradients computed depend on the interplay of tf.function decorators in the following way.

首先,我为二进制分类创建一些合成数据

First I create some synthetic data for a binary classification

tf.random.set_seed(42)
np.random.seed(42)
x=tf.random.normal((2,1))
y=tf.constant(np.random.choice([0,1],2))

然后我定义两个仅在tf.function装饰器中有所不同的损失函数

Then I define two loss functions that differ only in the tf.function decorator

weights=tf.constant([1.,.1])[tf.newaxis,...]

def customloss1(y_true,y_pred,sample_weight=None):
    y_true_one_hot=tf.one_hot(tf.cast(y_true,tf.uint8),2)
    y_true_scale=tf.multiply(weights,y_true_one_hot)
    return tf.reduce_mean(tf.keras.losses.categorical_crossentropy(y_true_scale,y_pred))

@tf.function
def customloss2(y_true,y_pred,sample_weight=None):
    y_true_one_hot=tf.one_hot(tf.cast(y_true,tf.uint8),2)
    y_true_scale=tf.multiply(weights,y_true_one_hot)
    return tf.reduce_mean(tf.keras.losses.categorical_crossentropy(y_true_scale,y_pred))

然后,我制作了一个非常简单的逻辑回归模型,其中删除了所有的钟声,以保持简单

Then I make a very simple logistic regression model with all the bells and whistles removed to keep it simple

tf.random.set_seed(42)
np.random.seed(42)
model=tf.keras.Sequential([
    tf.keras.layers.Dense(2,use_bias=False,activation='softmax',input_shape=[1,])
])

最后定义两个函数来计算上述损失函数的梯度,其中一个由tf.修饰,另一个不由tf修饰.

and finally define two functions to calculate the gradients of the aforementioned loss functions with one being decorated by tf.function and the other not being decorated by it

def get_gradients1(x,y):
    with tf.GradientTape() as tape1:
        p1=model(x)
        l1=customloss1(y,p1)
    with tf.GradientTape() as tape2:
        p2=model(x)
        l2=customloss2(y,p2)

    gradients1=tape1.gradient(l1,model.trainable_variables)
    gradients2=tape2.gradient(l2,model.trainable_variables)

    return gradients1, gradients2

@tf.function
def get_gradients2(x,y):
    with tf.GradientTape() as tape1:
        p1=model(x)
        l1=customloss1(y,p1)
    with tf.GradientTape() as tape2:
        p2=model(x)
        l2=customloss2(y,p2)

    gradients1=tape1.gradient(l1,model.trainable_variables)
    gradients2=tape2.gradient(l2,model.trainable_variables)

    return gradients1, gradients2

现在我跑步

get_gradients1(x,y)

我知道

([<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[ 0.11473544, -0.11473544]], dtype=float32)>],
 [<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[ 0.11473544, -0.11473544]], dtype=float32)>])

,并且渐变与预期相等.但是当我跑步

and the gradients are equal as expected. However when I run

get_gradients2(x,y)

我知道

([<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[ 0.02213785, -0.5065186 ]], dtype=float32)>],
 [<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[ 0.11473544, -0.11473544]], dtype=float32)>])

只有第二个答案是正确的.因此,当装饰了我的外部功能时,我也只能从装饰的内部功能中获得正确的答案.我给人的印象是装饰外部的(在许多应用程序中是训练循环)就足够了,但是在这里我们看不到.我想了解为什么要装饰使用的功能,然后还需要深入到什么地方?

where only the second answer is correct. Thus, when my outer function is decorated I only get the correct answer from the inner function that is decorated as well. I was under the impression that decorating the outer one (which is the training loop in many applications) is sufficient but here we see its not. I want to understand why and also then how deep does one have to go to decorate the functions being used?

添加了一些调试信息

我添加了一些调试信息,并且只显示了customloss2的代码(另一个是相同的)

I added some debugging info and I show the code only for customloss2 (the other is identical)

@tf.function
def customloss2(y_true,y_pred,sample_weight=None):
    y_true_one_hot=tf.one_hot(tf.cast(y_true,tf.uint8),2)
    y_true_scale=tf.multiply(weights,y_true_one_hot)
    tf.print('customloss2',type(y_true_scale),type(y_pred))
    tf.print('y_true_scale','\n',y_true_scale)
    tf.print('y_pred','\n',y_pred)
    return tf.reduce_mean(tf.keras.losses.categorical_crossentropy(y_true_scale,y_pred))

并在运行get_gradients1时得到

and on running get_gradients1 I get

customloss1 <type 'EagerTensor'> <type 'EagerTensor'>
y_true_scale 
 [[1 0]
 [0 0.1]]
y_pred 
 [[0.510775387 0.489224613]
 [0.529191136 0.470808864]]
customloss2 <class 'tensorflow.python.framework.ops.Tensor'> <class 'tensorflow.python.framework.ops.Tensor'>
y_true_scale 
 [[1 0]
 [0 0.1]]
y_pred 
 [[0.510775387 0.489224613]
 [0.529191136 0.470808864]]

我们看到customloss1的张量是Eager,而customloss2的张量是Tensor,但是我们得到相同的渐变值.

we see that the tensors for customloss1 are Eager but for customloss2 are Tensor and yet we get same value for gradients.

另一方面,当我在get_gradients2上运行它时

On the other hand when I run it on get_gradients2

customloss1 <class 'tensorflow.python.framework.ops.Tensor'> <class 'tensorflow.python.framework.ops.Tensor'>
y_true_scale 
 [[1 0]
 [0 0.1]]
y_pred 
 [[0.510775387 0.489224613]
 [0.529191136 0.470808864]]
customloss2 <class 'tensorflow.python.framework.ops.Tensor'> <class 'tensorflow.python.framework.ops.Tensor'>
y_true_scale 
 [[1 0]
 [0 0.1]]
y_pred 
 [[0.510775387 0.489224613]
 [0.529191136 0.470808864]]

我们看到所有东西都是相同的,没有张量是渴望的,但是我得到了不同的梯度!

we see everything is identical with no tensors being Eager and yet I get different gradients!

推荐答案

事实证明这是一个错误,我提出来了此处.

It turns out this is a bug and I have raised it here.

这篇关于GradientTape根据是否由tf.function装饰的损失函数给出不同的梯度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆