如何在Keras DQN中实现梯度上升 [英] How to implement gradient ascent in a Keras DQN

查看:135
本文介绍了如何在Keras DQN中实现梯度上升的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

已经构建了一个强化学习DQN,其中以可变长度序列作为输入,并为操作计算了正负奖励.我在Keras中的DQN模型存在一些问题,这意味着尽管该模型可以运行,但随着时间的推移,在单个和多个epsilon周期内,平均奖励会降低.即使经过大量的培训,这种情况也不会改变.

Have built a Reinforcement Learning DQN with variable length sequences as inputs, and positive and negative rewards calculated for actions. Some problem with my DQN model in Keras means that although the model runs, average rewards over time decrease, over single and multiple cycles of epsilon. This does not change even after significant period of training.

我的想法是,这是由于在Keras中使用MeanSquareError作为Loss函数(最小化错误).因此,我正在尝试实施梯度上升(以最大化回报).如何在Keras中做到这一点?我当前的模型是:

My thinking is that this is due to using MeanSquareError in Keras as the Loss function (minimising error). So I am trying to implement gradient ascent (to maximise reward). How to do this in Keras? My current model is:

model = Sequential()
inp = (env.NUM_TIMEPERIODS, env.NUM_FEATURES)
model.add(Input(shape=inp))  # 'a shape tuple(integers), not including batch-size
model.add(Masking(mask_value=0., input_shape=inp))

model.add(LSTM(env.NUM_FEATURES, input_shape=inp, return_sequences=True))
model.add(LSTM(env.NUM_FEATURES))
model.add(Dense(env.NUM_FEATURES))
model.add(Dense(4))

model.compile(loss='mse,
              optimizer=Adam(lr=LEARNING_RATE, decay=DECAY),
              metrics=[tf.keras.losses.MeanSquaredError()])

在尝试实现梯度上升时,通过翻转"梯度(是负损失还是逆损失?),我尝试了各种损失定义:

In trying to implement gradient ascent, by 'flipping' the gradient (as negative or inverse loss?), I have tried various loss definitions:

loss=-'mse'    
loss=-tf.keras.losses.MeanSquaredError()    
loss=1/tf.keras.losses.MeanSquaredError()

但是这些都会生成错误的操作数(用于一元错误).

but these all generate bad operand [for unary] errors.

如何适应当前的Keras模型以最大化回报? 还是这种梯度上升甚至不是问题?行动政策可能有问题吗?

How to adapt current Keras model to maximise rewards ? Or is this gradient ascent not even the problem? Could it be some issue with the action policy?

推荐答案

编写自定义损失函数

这是您想要的损失函数

Writing a custom loss function

Here is the loss function you want

@tf.function
def positive_mse(y_true, y_pred):
    return -1 * tf.keras.losses.MSE(y_true, y_pred)

然后您的编译行变为

model.compile(loss=positive_mse,
          optimizer=Adam(lr=LEARNING_RATE, decay=DECAY),
          metrics=[tf.keras.losses.MeanSquaredError()])

请注意:使用loss=positive_mse而不是loss=positive_mse().那不是错字.这是因为您需要传递函数,而不是执行函数的结果.

Please note : use loss=positive_mse and not loss=positive_mse(). That's not a typo. This is because you need to pass the function, not the results of executing the function.

这篇关于如何在Keras DQN中实现梯度上升的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆