如何在Tensorflow中的多个GPU中更新BatchNorm的变量 [英] How to update variable of BatchNorm in multiple GPUs in Tensorflow

查看:136
本文介绍了如何在Tensorflow中的多个GPU中更新BatchNorm的变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个训练批处理规范(BN)层的网络.我的批处理大小为16,因此,我必须使用多个GPU.我遵循了 inceptionv3 的示例可以概括为

I have a network that trains the Batch Norm (BN) layer. My batch size is 16, hence, I must use multiple GPUs. I have followed the example of inceptionv3 that can be summarized as

with tf.Graph().as_default(), tf.device('/cpu:0'):
    images_splits = tf.split(axis=0, num_or_size_splits=FLAGS.num_gpus, value=images)
    labels_splits = tf.split(axis=0, num_or_size_splits=FLAGS.num_gpus, value=labels)
    for i in range(FLAGS.num_gpus):
      with tf.device('/gpu:%d' % i):
        with tf.name_scope('%s_%d' % (inception.TOWER_NAME, i)) as scope:
          ...
          # Reuse variables for the next tower.
          batchnorm_updates = tf.get_collection(slim.ops.UPDATE_OPS_COLLECTION,
                                                scope)
          grads = opt.compute_gradients(loss)
          tower_grads.append(grads)
    grads = _average_gradients(tower_grads)
    apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)
    variable_averages = tf.train.ExponentialMovingAverage(
        inception.MOVING_AVERAGE_DECAY, global_step)
    variables_to_average = (tf.trainable_variables() +
                            tf.moving_average_variables())
    variables_averages_op = variable_averages.apply(variables_to_average)
    batchnorm_updates_op = tf.group(*batchnorm_updates)
    train_op = tf.group(apply_gradient_op, variables_averages_op,
                        batchnorm_updates_op)

不幸的是,它为BN层使用了苗条的库,而我使用了标准BN tf.contrib.layers.batch_norm

Unfortunatelly, it used slim library for BN layer while I used standard BN tf.contrib.layers.batch_norm

def _batch_norm(self, x, name, is_training, activation_fn, trainable=False):
    with tf.variable_scope(name+'/BatchNorm') as scope:
        o = tf.contrib.layers.batch_norm(
            x,
            scale=True,
            activation_fn=activation_fn,
            is_training=is_training,
            trainable=trainable,
            scope=scope)
        return o

为了收集moving_mean和moving_variance,我使用了tf.GraphKeys.UPDATE_OPS

For collecting moving_mean and moving_variance, I used tf.GraphKeys.UPDATE_OPS

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) 
with tf.control_dependencies(update_ops):
    self.train_op = tf.group(train_op_conv, train_op_fc)

最后,可以在inceptionv3中借鉴在多个GPU中使用BN的想法,

Finally, the idea of using BN in multiple GPUs can borrow from inceptionv3 as

split_image_batch = tf.split(self.image_batch, self.conf.num_gpus, 0)
split_label_batch = tf.split(self.label_batch, self.conf.num_gpus, 0)
global_step = tf.train.get_or_create_global_step()
opt= tf.train.MomentumOptimizer(self.learning_rate, self.conf.momentum)
tower_grads_encoder = []
tower_grads_decoder = []
update_ops=[]
with tf.variable_scope(tf.get_variable_scope()):
    for i in range(self.conf.num_gpus):
        with tf.device('/gpu:%d' % i):
            net = Resnet(split_image_batch[i], self.conf.num_classes) #Build BN layer
            # Loss function
            self.reduced_loss = tf.reduce_mean(loss) + tf.add_n(l2_losses)
            # Reuse variables for the next GPU.
            tf.get_variable_scope().reuse_variables()
            update_ops.extend)tf.get_collection(tf.GraphKeys.UPDATE_OPS))
            # Compute grads
            grads_encoder = opt.compute_gradients(self.reduced_loss, var_list=encoder_trainable)
            grads_decoder = opt.compute_gradients(self.reduced_loss, var_list=decoder_trainable)
            tower_grads_encoder.append(grads_encoder)
            tower_grads_decoder.append(grads_decoder)
grads_encoder = self._average_gradients(tower_grads_encoder)
grads_decoder = self._average_gradients(tower_grads_decoder)
# Update params
train_op_conv = opt.apply_gradients(grads_encoder, global_step=global_step)
train_op_fc   = opt.apply_gradients(grads_decoder,global_step=global_step)
variable_averages = tf.train.ExponentialMovingAverage(self.conf.MOVING_AVERAGE_DECAY, global_step)
variables_averages_op = variable_averages.apply(tf.trainable_variables())

with tf.control_dependencies(update_ops):
    self.train_op = tf.group(train_op_conv, train_op_fc, variables_averages_op)

尽管代码运行没有错误,但是性能却很低.看来我没有正确收集BN参数.您能否看一下我的代码,并给我一些在多GPU中训练BN的指导?谢谢

Although the code ran without error but the performance is very low. It looks that I did not collect BN parameters correctly. Could you look at my code and give me some direction for training BN in multiple GPU? Thanks

推荐答案

我怀疑性能问题与您在每个步骤(从每个塔的每个批处理规范中)进行几个变量更新有关.

I suspect the performance problems have to do with you doing several variables updates per step (from each batch norm in each tower).

您是否有理由需要从每个GPU获取批处理规范更新?我们建议仅使用来自单个塔的统计信息来更新批处理规范,因为除非分区中存在偏差(这会导致其他问题),否则它应该是相同的.

Is there a reason you need to get batch norm updates from each GPU? We recommend just using the statistics from a single tower to update batch norm, as unless there is skew in your partitioning (which will cause other problems), it should work out to be the same.

如果您将批处理规范更新限制为来自单个塔的批处理规范更新,则会将变量更新减少num_gpus倍.

If you restrict your batch norm updates to those from a single tower you reduce your variable updates by a factor of num_gpus.

这篇关于如何在Tensorflow中的多个GPU中更新BatchNorm的变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆