如何从训练步骤到训练步骤保持状态? [英] How can I keep state from train step to train step?

查看:25
本文介绍了如何从训练步骤到训练步骤保持状态?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的计算图中有一个张量,我想在每个训练步骤后添加一行.我怎样才能做到这一点?

I have a tensor in my computation graph that I'd like to add a row to after every train step. How can I accomplish this?

更多细节:我从 optimizer.compute_gradients 获取梯度,我想根据梯度历史修改这些梯度.这是我尝试使用的代码:

More detail: I'm grabbing gradients from optimizer.compute_gradients, and I'd like to modify those gradients based on the gradient history. Here is the code that I'm trying to use:

def process_gradient(gradient, optimizer, name):
  reshaped_gradient = flatten(gradient)

  if gradient.name in optimizer._slots:
    optimizer._slots[gradient.name] += [reshaped_gradient]
  else:
    optimizer._slots[gradient.name] = [reshaped_gradient]

  # each 
  gradients_over_time = tf.stack(optimizer._slots[gradient.name])

  print('gradients_over_time.get_shape()', gradients_over_time.get_shape())

  return gradient

...

grads_and_vars = optimizer.compute_gradients(cost_function)
train_step = optimizer.apply_gradients([(process_gradient(grad, optimizer, str(i)), var) for i, (grad, var) in enumerate(grads_and_vars)])

我还尝试保留一个变量,用于通过连接新行来跟踪行,但这不起作用.

I've also tried keeping a variable around that I use to keep track of rows by concatenating new rows onto, but that didn't work.

推荐答案

我最终使用 tf.py_func 来完成这个.我在 Python 函数中访问的全局列表中跟踪状态.这里应用了渐变:

I ended up using tf.py_func to accomplish this. I keep track of state in a global list that is accessed in the Python function. Here the gradients are applied:

# process each individual gradient before applying it
train_step = optimizer.apply_gradients([(process_gradient(grad, str(i)), var) for i, (grad, var) in enumerate(grads_and_vars)])

这里是我随着时间的推移跟踪状态的地方,并将使用建立的状态:

Here is where I keep track of state over time, and would use the built up state:

def construct_processor(name):
  global_gradients_over_time = {}

  def python_process_gradient(gradient):
    reshaped_gradient = gradient.flatten()

    if name in global_gradients_over_time:
      global_gradients_over_time[name].append(reshaped_gradient)
    else:
      global_gradients_over_time[name] = [reshaped_gradient]

    # process gradients somehow

    return gradient

  return python_process_gradient

def process_gradient(gradient, name):
  return tf.py_func(construct_processor(name), [gradient], tf.float32)

construct_processor 只是为了让您一次处理一个渐变,给每组渐变一个名称,以便我可以在全局字典中找到它们.我认为,这种方法还可以使内存远离 GPU.

construct_processor is just there to allow you to process gradients one at a time, giving each set of gradients a name so I can find them in the global dictionary. This approach also keeps memory off the GPU, I think.

这篇关于如何从训练步骤到训练步骤保持状态?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆