专家混合 - 仅在每次迭代时训练最佳模型 [英] Mixture of experts - Train best model only at each iteration
问题描述
我正在尝试基于 tensorflow 中的 Mixture-of-Experts 论文实现一种粗略的方法 - https://arxiv.org/abs/1701.06538
>
将定义 n
个模型:
模型_1:var_11var_12损失_1优化器_1模型_2:var_21var_22损失_2优化器_2模型_3:var_31var_32损失_3优化器_3
在每次迭代中,我只想以最小的损失训练模型,同时保持其他变量不变.是否可以放置一个开关来只执行一个优化器?
P.S:这个问题的基础与我之前问过的问题相似.http://stackoverflow.com/questions/42073239/tf-get-collection-to-extract-variables-of-one-scope/42074009?noredirect=1#comment71359330_42074009
由于那里的建议不起作用,我正在尝试以不同的方式解决问题.
提前致谢!
这似乎可以通过 tf.cond
实现:
将 tensorflow 导入为 tfdef make_conditional_train_op(should_update、优化器、variable_lists、损失):"""有条件地训练变量.每个参数都是一个张量的 Python 列表,每个列表必须具有相同的长度.仅在以下情况下才根据其优化器更新变量相应的should_update"布尔张量在给定步骤为真.返回执行条件更新的单个火车操作."""断言 len(优化器) == len(variable_lists)断言 len(variable_lists) == len(losses)断言 len(should_update) == len(variable_lists)条件更新 = []对于模型编号,(更新布尔值,优化器,变量,损失)在枚举(zip(should_update,优化器,variable_lists,loss)):conditional_updates.append(tf.cond(update_boolean,拉姆达:tf.group(optimizer.minimize(loss, var_list=variables),tf.Print(0, ["模型{}更新".format(model_number), loss])),拉姆达:tf.no_op()))返回 tf.group(*conditional_updates)
基本策略是确保优化器的变量更新定义在 cond
分支之一的 lambda
中,在这种情况下,有条件操作执行,这意味着只有在 cond
的分支被触发时才会对变量(和优化器累加器)进行赋值.
例如,我们可以构建一些模型:
def make_model_and_optimizer():scalar_variable = tf.get_variable("scalar", shape=[])vector_variable = tf.get_variable("vector", shape=[3])损失 = tf.reduce_sum(scalar_variable * vector_variable)优化器 = tf.train.AdamOptimizer(0.1)返回优化器,[scalar_variable,vector_variable],损失# 构建每个模型优化器 = []变量列表 = []损失 = []对于范围内的我(10):使用 tf.variable_scope("model_{}".format(i)):优化器,变量,损失 = make_model_and_optimizer()optimizers.append(优化器)variable_lists.append(变量)损失.追加(损失)
然后确定一个条件更新策略,在这种情况下只训练具有最大损失的模型(只是因为这会导致更多的切换;如果只有一个模型更新,输出会相当无聊):
# 确定应该更新哪个模型(在这种情况下,具有# 最大损失)integer_one_hot = tf.one_hot(tf.argmax(tf.stack(losses),轴=0),深度=len(损失))is_max = tf.equal(integer_one_hot,tf.ones_like(integer_one_hot))
最后,我们可以调用 make_conditional_train_op
函数来创建训练操作,然后进行一些训练迭代:
train_op = make_conditional_train_op(tf.unstack(is_max)、优化器、variable_lists、损失)# 重复调用有条件的训练操作使用 tf.Session():tf.global_variables_initializer().run()对于范围内的我(20):打印(迭代{}".格式(i))train_op.run()
这是打印每次迭代更新的索引及其损失,确认条件执行:
迭代0I tensorflow/core/kernels/logging_ops.cc:79] [Model 6 更新][2.7271919]迭代 1I tensorflow/core/kernels/logging_ops.cc:79] [Model 6 更新][2.1755948]迭代 2I tensorflow/core/kernels/logging_ops.cc:79] [模型 2 更新][1.9858969]迭代 3I tensorflow/core/kernels/logging_ops.cc:79] [Model 6 更新][1.6859927]
I am trying to implement a crude method based on the Mixture-of-Experts paper in tensorflow - https://arxiv.org/abs/1701.06538
There would be n
models defined:
model_1:
var_11
var_12
loss_1
optimizer_1
model_2:
var_21
var_22
loss_2
optimizer_2
model_3:
var_31
var_32
loss_3
optimizer_3
At every iteration, I want to train the model with the least loss only while keeping the other variables constant. Is it possible to place a switch to execute one of the optimizer only?
P.S: This base of this problem is similar to one I had asked previously. http://stackoverflow.com/questions/42073239/tf-get-collection-to-extract-variables-of-one-scope/42074009?noredirect=1#comment71359330_42074009
Since the suggestion there did not work, I am trying to approach the problem differently.
Thanks in advance!
This seems to be doable with tf.cond
:
import tensorflow as tf
def make_conditional_train_op(
should_update, optimizers, variable_lists, losses):
"""Conditionally trains variables.
Each argument is a Python list of Tensors, and each list must have the same
length. Variables are updated based on their optimizer only if the
corresponding `should_update` boolean Tensor is True at a given step.
Returns a single train op which performs the conditional updates.
"""
assert len(optimizers) == len(variable_lists)
assert len(variable_lists) == len(losses)
assert len(should_update) == len(variable_lists)
conditional_updates = []
for model_number, (update_boolean, optimizer, variables, loss) in enumerate(
zip(should_update, optimizers, variable_lists, losses)):
conditional_updates.append(
tf.cond(update_boolean,
lambda: tf.group(
optimizer.minimize(loss, var_list=variables),
tf.Print(0, ["Model {} updating".format(model_number), loss])),
lambda: tf.no_op()))
return tf.group(*conditional_updates)
The basic strategy is to make sure the optimizer's variable updates are defined in the lambda
of one of the cond
branches, in which case there is true conditional op execution, meaning that the assignment to variables (and optimizer accumulators) only happens if that branch of the cond
is triggered.
As an example, we can construct some models:
def make_model_and_optimizer():
scalar_variable = tf.get_variable("scalar", shape=[])
vector_variable = tf.get_variable("vector", shape=[3])
loss = tf.reduce_sum(scalar_variable * vector_variable)
optimizer = tf.train.AdamOptimizer(0.1)
return optimizer, [scalar_variable, vector_variable], loss
# Construct each model
optimizers = []
variable_lists = []
losses = []
for i in range(10):
with tf.variable_scope("model_{}".format(i)):
optimizer, variables, loss = make_model_and_optimizer()
optimizers.append(optimizer)
variable_lists.append(variables)
losses.append(loss)
Then determine a conditional update strategy, in this case only training the model with the maximum loss (just because that results in more switching; the output is rather boring if only one model ever updates):
# Determine which model should be updated (in this case, the one with the
# maximum loss)
integer_one_hot = tf.one_hot(
tf.argmax(tf.stack(losses),
axis=0),
depth=len(losses))
is_max = tf.equal(
integer_one_hot,
tf.ones_like(integer_one_hot))
Finally, we can call the make_conditional_train_op
function to create the train op, then do some training iterations:
train_op = make_conditional_train_op(
tf.unstack(is_max), optimizers, variable_lists, losses)
# Repeatedly call the conditional train op
with tf.Session():
tf.global_variables_initializer().run()
for i in range(20):
print("Iteration {}".format(i))
train_op.run()
This is printing the index which is updated and its loss at each iteration, confirming the conditional execution:
Iteration 0
I tensorflow/core/kernels/logging_ops.cc:79] [Model 6 updating][2.7271919]
Iteration 1
I tensorflow/core/kernels/logging_ops.cc:79] [Model 6 updating][2.1755948]
Iteration 2
I tensorflow/core/kernels/logging_ops.cc:79] [Model 2 updating][1.9858969]
Iteration 3
I tensorflow/core/kernels/logging_ops.cc:79] [Model 6 updating][1.6859927]
这篇关于专家混合 - 仅在每次迭代时训练最佳模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!