专家混合 - 仅在每次迭代时训练最佳模型 [英] Mixture of experts - Train best model only at each iteration

查看:30
本文介绍了专家混合 - 仅在每次迭代时训练最佳模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试基于 tensorflow 中的 Mixture-of-Experts 论文实现一种粗略的方法 - https://arxiv.org/abs/1701.06538

将定义 n 个模型:

 模型_1:var_11var_12损失_1优化器_1模型_2:var_21var_22损失_2优化器_2模型_3:var_31var_32损失_3优化器_3

在每次迭代中,我只想以最小的损失训练模型,同时保持其他变量不变.是否可以放置一个开关来只执行一个优化器?

P.S:这个问题的基础与我之前问过的问题相似.http://stackoverflow.com/questions/42073239/tf-get-collection-to-extract-variables-of-one-scope/42074009?noredirect=1#comment71359330_42074009

由于那里的建议不起作用,我正在尝试以不同的方式解决问题.

提前致谢!

解决方案

这似乎可以通过 tf.cond 实现:

 将 tensorflow 导入为 tfdef make_conditional_train_op(should_update、优化器、variable_lists、损失):"""有条件地训练变量.每个参数都是一个张量的 Python 列表,每个列表必须具有相同的长度.仅在以下情况下才根据其优化器更新变量相应的should_update"布尔张量在给定步骤为真.返回执行条件更新的单个火车操作."""断言 len(优化器) == len(variable_lists)断言 len(variable_lists) == len(losses)断言 len(should_update) == len(variable_lists)条件更新 = []对于模型编号,(更新布尔值,优化器,变量,损失)在枚举(zip(should_update,优化器,variable_lists,loss)):conditional_updates.append(tf.cond(update_boolean,拉姆达:tf.group(optimizer.minimize(loss, var_list=variables),tf.Print(0, ["模型{}更新".format(model_number), loss])),拉姆达:tf.no_op()))返回 tf.group(*conditional_updates)

基本策略是确保优化器的变量更新定义在 cond 分支之一的 lambda 中,在这种情况下,有条件操作执行,这意味着只有在 cond 的分支被触发时才会对变量(和优化器累加器)进行赋值.

例如,我们可以构建一些模型:

def make_model_and_optimizer():scalar_variable = tf.get_variable("scalar", shape=[])vector_variable = tf.get_variable("vector", shape=[3])损失 = tf.reduce_sum(scalar_variable * vector_variable)优化器 = tf.train.AdamOptimizer(0.1)返回优化器,[scalar_variable,vector_variable],损失# 构建每个模型优化器 = []变量列表 = []损失 = []对于范围内的我(10):使用 tf.variable_scope("model_{}".format(i)):优化器,变量,损失 = make_model_and_optimizer()optimizers.append(优化器)variable_lists.append(变量)损失.追加(损失)

然后确定一个条件更新策略,在这种情况下只训练具有最大损失的模型(只是因为这会导致更多的切换;如果只有一个模型更新,输出会相当无聊):

# 确定应该更新哪个模型(在这种情况下,具有# 最大损失)integer_one_hot = tf.one_hot(tf.argmax(tf.stack(losses),轴=0),深度=len(损失))is_max = tf.equal(integer_one_hot,tf.ones_like(integer_one_hot))

最后,我们可以调用 make_conditional_train_op 函数来创建训练操作,然后进行一些训练迭代:

train_op = make_conditional_train_op(tf.unstack(is_max)、优化器、variable_lists、损失)# 重复调用有条件的训练操作使用 tf.Session():tf.global_variables_initializer().run()对于范围内的我(20):打印(迭代{}".格式(i))train_op.run()

这是打印每次迭代更新的索引及其损失,确认条件执行:

迭代0I tensorflow/core/kernels/logging_ops.cc:79] [Model 6 更新][2.7271919]迭代 1I tensorflow/core/kernels/logging_ops.cc:79] [Model 6 更新][2.1755948]迭代 2I tensorflow/core/kernels/logging_ops.cc:79] [模型 2 更新][1.9858969]迭代 3I tensorflow/core/kernels/logging_ops.cc:79] [Model 6 更新][1.6859927]

I am trying to implement a crude method based on the Mixture-of-Experts paper in tensorflow - https://arxiv.org/abs/1701.06538

There would be n models defined:

    model_1:
        var_11
        var_12
        loss_1
        optimizer_1

    model_2:
        var_21
        var_22
        loss_2
        optimizer_2

    model_3:
        var_31
        var_32
        loss_3
        optimizer_3

At every iteration, I want to train the model with the least loss only while keeping the other variables constant. Is it possible to place a switch to execute one of the optimizer only?

P.S: This base of this problem is similar to one I had asked previously. http://stackoverflow.com/questions/42073239/tf-get-collection-to-extract-variables-of-one-scope/42074009?noredirect=1#comment71359330_42074009

Since the suggestion there did not work, I am trying to approach the problem differently.

Thanks in advance!

解决方案

This seems to be doable with tf.cond:

import tensorflow as tf

def make_conditional_train_op(
    should_update, optimizers, variable_lists, losses):
  """Conditionally trains variables.

  Each argument is a Python list of Tensors, and each list must have the same
  length. Variables are updated based on their optimizer only if the
  corresponding `should_update` boolean Tensor is True at a given step.

  Returns a single train op which performs the conditional updates.
  """
  assert len(optimizers) == len(variable_lists)
  assert len(variable_lists) == len(losses)
  assert len(should_update) == len(variable_lists)
  conditional_updates = []
  for model_number, (update_boolean, optimizer, variables, loss) in enumerate(
      zip(should_update, optimizers, variable_lists, losses)):
    conditional_updates.append(
        tf.cond(update_boolean,
                lambda: tf.group(
                    optimizer.minimize(loss, var_list=variables),
                    tf.Print(0, ["Model {} updating".format(model_number), loss])),
                lambda: tf.no_op()))
  return tf.group(*conditional_updates)

The basic strategy is to make sure the optimizer's variable updates are defined in the lambda of one of the cond branches, in which case there is true conditional op execution, meaning that the assignment to variables (and optimizer accumulators) only happens if that branch of the cond is triggered.

As an example, we can construct some models:

def make_model_and_optimizer():
  scalar_variable = tf.get_variable("scalar", shape=[])
  vector_variable = tf.get_variable("vector", shape=[3])
  loss = tf.reduce_sum(scalar_variable * vector_variable)
  optimizer = tf.train.AdamOptimizer(0.1)
  return optimizer, [scalar_variable, vector_variable], loss

# Construct each model
optimizers = []
variable_lists = []
losses = []
for i in range(10):
  with tf.variable_scope("model_{}".format(i)):
    optimizer, variables, loss = make_model_and_optimizer()
  optimizers.append(optimizer)
  variable_lists.append(variables)
  losses.append(loss)

Then determine a conditional update strategy, in this case only training the model with the maximum loss (just because that results in more switching; the output is rather boring if only one model ever updates):

# Determine which model should be updated (in this case, the one with the
# maximum loss)
integer_one_hot = tf.one_hot(
    tf.argmax(tf.stack(losses),
              axis=0),
    depth=len(losses))
is_max = tf.equal(
    integer_one_hot,
    tf.ones_like(integer_one_hot))

Finally, we can call the make_conditional_train_op function to create the train op, then do some training iterations:

train_op = make_conditional_train_op(
    tf.unstack(is_max), optimizers, variable_lists, losses)

# Repeatedly call the conditional train op
with tf.Session():
  tf.global_variables_initializer().run()
  for i in range(20):
    print("Iteration {}".format(i))
    train_op.run()

This is printing the index which is updated and its loss at each iteration, confirming the conditional execution:

Iteration 0
I tensorflow/core/kernels/logging_ops.cc:79] [Model 6 updating][2.7271919]
Iteration 1
I tensorflow/core/kernels/logging_ops.cc:79] [Model 6 updating][2.1755948]
Iteration 2
I tensorflow/core/kernels/logging_ops.cc:79] [Model 2 updating][1.9858969]
Iteration 3
I tensorflow/core/kernels/logging_ops.cc:79] [Model 6 updating][1.6859927]

这篇关于专家混合 - 仅在每次迭代时训练最佳模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆