如何从训练步骤到训练步骤保持状态? [英] How can I keep state from train step to train step?
问题描述
我的计算图中有一个张量,我想在每个训练步骤后添加一行.我怎样才能做到这一点?
I have a tensor in my computation graph that I'd like to add a row to after every train step. How can I accomplish this?
更多细节:我从 optimizer.compute_gradients
获取梯度,我想根据梯度历史修改这些梯度.这是我尝试使用的代码:
More detail: I'm grabbing gradients from optimizer.compute_gradients
, and I'd like to modify those gradients based on the gradient history. Here is the code that I'm trying to use:
def process_gradient(gradient, optimizer, name):
reshaped_gradient = flatten(gradient)
if gradient.name in optimizer._slots:
optimizer._slots[gradient.name] += [reshaped_gradient]
else:
optimizer._slots[gradient.name] = [reshaped_gradient]
# each
gradients_over_time = tf.stack(optimizer._slots[gradient.name])
print('gradients_over_time.get_shape()', gradients_over_time.get_shape())
return gradient
...
grads_and_vars = optimizer.compute_gradients(cost_function)
train_step = optimizer.apply_gradients([(process_gradient(grad, optimizer, str(i)), var) for i, (grad, var) in enumerate(grads_and_vars)])
我还尝试保留一个变量,用于通过连接新行来跟踪行,但这不起作用.
I've also tried keeping a variable around that I use to keep track of rows by concatenating new rows onto, but that didn't work.
推荐答案
我最终使用 tf.py_func
来完成这个.我在 Python 函数中访问的全局列表中跟踪状态.这里应用了渐变:
I ended up using tf.py_func
to accomplish this. I keep track of state in a global list that is accessed in the Python function. Here the gradients are applied:
# process each individual gradient before applying it
train_step = optimizer.apply_gradients([(process_gradient(grad, str(i)), var) for i, (grad, var) in enumerate(grads_and_vars)])
这里是我随着时间的推移跟踪状态的地方,并将使用建立的状态:
Here is where I keep track of state over time, and would use the built up state:
def construct_processor(name):
global_gradients_over_time = {}
def python_process_gradient(gradient):
reshaped_gradient = gradient.flatten()
if name in global_gradients_over_time:
global_gradients_over_time[name].append(reshaped_gradient)
else:
global_gradients_over_time[name] = [reshaped_gradient]
# process gradients somehow
return gradient
return python_process_gradient
def process_gradient(gradient, name):
return tf.py_func(construct_processor(name), [gradient], tf.float32)
construct_processor
只是为了让您一次处理一个渐变,给每组渐变一个名称,以便我可以在全局字典中找到它们.我认为,这种方法还可以使内存远离 GPU.
construct_processor
is just there to allow you to process gradients one at a time, giving each set of gradients a name so I can find them in the global dictionary. This approach also keeps memory off the GPU, I think.
这篇关于如何从训练步骤到训练步骤保持状态?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!