Tensorflow:如何批量获取每个实例的梯度? [英] Tensorflow: How to get gradients per instance in a batch?

查看：49 发布时间：2021/9/5 18:58:05 python tensorflow

本文介绍了Tensorflow:如何批量获取每个实例的梯度?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在查看此笔记本中的策略梯度示例:https://github.com/ageron/handson-ml/blob/master/16_reinforcement_learning.ipynb

I'm looking at the policy gradients sample in this notebook: https://github.com/ageron/handson-ml/blob/master/16_reinforcement_learning.ipynb

相关代码在这里:

X = tf.placeholder(tf.float32, shape=[None, n_inputs])

hidden = tf.layers.dense(X, n_hidden, activation=tf.nn.elu, kernel_initializer=initializer)
logits = tf.layers.dense(hidden, n_outputs)
outputs = tf.nn.sigmoid(logits)  # probability of action 0 (left)
p_left_and_right = tf.concat(axis=1, values=[outputs, 1 - outputs])
action = tf.multinomial(tf.log(p_left_and_right), num_samples=1)

y = 1. - tf.to_float(action)
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits)
optimizer = tf.train.AdamOptimizer(learning_rate)
grads_and_vars = optimizer.compute_gradients(cross_entropy)
gradients = [grad for grad, variable in grads_and_vars]
gradient_placeholders = []
grads_and_vars_feed = []
for grad, variable in grads_and_vars:
    gradient_placeholder = tf.placeholder(tf.float32, shape=grad.get_shape())
    gradient_placeholders.append(gradient_placeholder)
    grads_and_vars_feed.append((gradient_placeholder, variable))
training_op = optimizer.apply_gradients(grads_and_vars_feed)

...
# Run training over a bunch of instances of inputs
            for step in range(n_max_steps):
                action_val, gradients_val = sess.run([action, gradients], feed_dict={X: obs.reshape(1, n_inputs)})
...
# Then weight each gradient by the action values, average, and feed them back into training_op to apply_gradients()

以上工作正常，因为每次 run() 返回不同的梯度.

The above works fine, as each run() returns different gradients.

我想对所有这些进行批处理，并将一组输入输入 run() 而不是一次输入一个输入(我的环境与示例中的环境不同，因此对我进行批处理是有意义的，并提高性能).即:

I'd like to batch all this, and feed an array of inputs into run() instead of one input at a time (my environment is different than the one in the sample, so it makes sense for me to batch, and improve performance). Ie:

action_val, gradients_val = sess.run([action, gradients], feed_dict={X: obs_array})

其中 obs_array 的形状为 [n_instances, n_inputs].

Where obs_array has shape [n_instances, n_inputs].

问题是 optimizer.compute_gradients(cross_entropy) 似乎返回单个梯度，即使 cross_entropy 是形状为 [None, 1] 的一维张量.action_val 确实返回一维动作张量，正如预期的那样 - 批处理中的每个实例一个动作.

The problem is that optimizer.compute_gradients(cross_entropy) seems to return a single gradient, even though cross_entropy is a 1d tensor of shape [None, 1]. action_val does return a 1d tensor of actions, as expected - one action per instance in the batch.

有什么方法可以让我获得一组渐变，批处理中的每个实例一个?

Is there any way for me to get an array of gradients, one per instance in the batch?

Tensorflow:如何批量获取每个实例的梯度? [英] Tensorflow: How to get gradients per instance in a batch?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Tensorflow:如何批量获取每个实例的梯度? [英] Tensorflow: How to get gradients per instance in a batch?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭