Tensorflow:如何批量获取每个实例的梯度? [英] Tensorflow: How to get gradients per instance in a batch?

查看:49
本文介绍了Tensorflow:如何批量获取每个实例的梯度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在查看此笔记本中的策略梯度示例:https://github.com/ageron/handson-ml/blob/master/16_reinforcement_learning.ipynb

I'm looking at the policy gradients sample in this notebook: https://github.com/ageron/handson-ml/blob/master/16_reinforcement_learning.ipynb

相关代码在这里:

X = tf.placeholder(tf.float32, shape=[None, n_inputs])

hidden = tf.layers.dense(X, n_hidden, activation=tf.nn.elu, kernel_initializer=initializer)
logits = tf.layers.dense(hidden, n_outputs)
outputs = tf.nn.sigmoid(logits)  # probability of action 0 (left)
p_left_and_right = tf.concat(axis=1, values=[outputs, 1 - outputs])
action = tf.multinomial(tf.log(p_left_and_right), num_samples=1)

y = 1. - tf.to_float(action)
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits)
optimizer = tf.train.AdamOptimizer(learning_rate)
grads_and_vars = optimizer.compute_gradients(cross_entropy)
gradients = [grad for grad, variable in grads_and_vars]
gradient_placeholders = []
grads_and_vars_feed = []
for grad, variable in grads_and_vars:
    gradient_placeholder = tf.placeholder(tf.float32, shape=grad.get_shape())
    gradient_placeholders.append(gradient_placeholder)
    grads_and_vars_feed.append((gradient_placeholder, variable))
training_op = optimizer.apply_gradients(grads_and_vars_feed)

...
# Run training over a bunch of instances of inputs
            for step in range(n_max_steps):
                action_val, gradients_val = sess.run([action, gradients], feed_dict={X: obs.reshape(1, n_inputs)})
...
# Then weight each gradient by the action values, average, and feed them back into training_op to apply_gradients()

以上工作正常,因为每次 run() 返回不同的梯度.

The above works fine, as each run() returns different gradients.

我想对所有这些进行批处理,并将一组输入输入 run() 而不是一次输入一个输入(我的环境与示例中的环境不同,因此对我进行批处理是有意义的,并提高性能).即:

I'd like to batch all this, and feed an array of inputs into run() instead of one input at a time (my environment is different than the one in the sample, so it makes sense for me to batch, and improve performance). Ie:

action_val, gradients_val = sess.run([action, gradients], feed_dict={X: obs_array})

其中 obs_array 的形状为 [n_instances, n_inputs].

Where obs_array has shape [n_instances, n_inputs].

问题是 optimizer.compute_gradients(cross_entropy) 似乎返回单个梯度,即使 cross_entropy 是形状为 [None, 1] 的一维张量.action_val 确实返回一维动作张量,正如预期的那样 - 批处理中的每个实例一个动作.

The problem is that optimizer.compute_gradients(cross_entropy) seems to return a single gradient, even though cross_entropy is a 1d tensor of shape [None, 1]. action_val does return a 1d tensor of actions, as expected - one action per instance in the batch.

有什么方法可以让我获得一组渐变,批处理中的每个实例一个?

Is there any way for me to get an array of gradients, one per instance in the batch?

推荐答案

问题在于 optimizer.compute_gradients(cross_entropy) 似乎返回单个梯度,即使 cross_entropy 是形状 [None, 1] 的一维张量.

The problem is that optimizer.compute_gradients(cross_entropy) seems to return a single gradient, even though cross_entropy is a 1d tensor of shape [None, 1].

这是设计使然,因为每个张量的梯度项会自动聚合.梯度计算操作,例如 optimizer.compute_gradients 和低级原语 tf.gradients 根据默认的AddN聚合方法对所有梯度操作进行求和.这适用于大多数随机梯度下降的情况.

That happens by design, as the gradient terms for each tensor are automatically aggregated. Gradient computation operations such as optimizer.compute_gradients and the low-level primitive tf.gradients make a sum of all gradient operations, according to the default AddN aggregation method. This is fine for most cases of stochastic gradient descent.

最后不幸的是,梯度计算将不得不在单个批次上进行.当然,除非构建了自定义梯度函数,或者扩展了 TensorFlow API 以提供没有完全聚合的梯度计算.更改 tf 的实现.gradients 做到这一点似乎不是很简单.

In the end unfortunately, gradient computation will have to be made over a single batch. Of course, unless a custom gradient function is built, or the TensorFlow API is extended to provide gradient computation without full aggregation. Changing the implementation of tf.gradients to do this does not seem to be very trivial.

您可能希望用于强化学习模型的一个技巧是并行执行多个会话运行.根据FAQ,Session API 支持多个并发步骤,并将利用现有的并行计算的资源.问题 TensorFlow 中的异步计算展示了如何做到这一点.

One trick that you might wish to employ for your reinforcement learning model is to perform multiple session runs in parallel. According to the FAQ, the Session API supports multiple concurrent steps, and will take advantage of the existing resources for parallel computation. The question Asynchronous computation in TensorFlow shows how to do this.

这篇关于Tensorflow:如何批量获取每个实例的梯度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆