Tensorflow-从还原的模型中平均模型权重 [英] Tensorflow - Averaging model weights from restored models

查看:249
本文介绍了Tensorflow-从还原的模型中平均模型权重的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

鉴于我在相同的数据上训练了多个不同的模型,并且我训练的所有神经网络都具有相同的体系结构,所以我想知道是否有可能恢复这些模型,对其权重求平均值并使用平均值初始化权重.

Given that I trained several different models on the same data and all the neural networks I trained have the same architecture I would like to know if it's possible to restore those models, average their weights and initialise my weights using the average.

这是图形外观的一个示例.基本上,我需要的是平均要加载的重量.

This is an example of how the graph might look. Basically what I need is an average of the weights I am going to load.

import tensorflow as tf
import numpy as np

#init model1 weights
weights = {
    'w1': tf.Variable(),
    'w2': tf.Variable()
}
# init model1 biases
biases = {
    'b1': tf.Variable(),
    'b2': tf.Variable()
}
#init model2 weights
weights2 = {
    'w1': tf.Variable(),
    'w2': tf.Variable()
}
# init model2 biases
biases2 = {
    'b1': tf.Variable(),
    'b2': tf.Variable(),
}

# this the average I want to create
w = {
    'w1': tf.Variable(
        tf.add(weights["w1"], weights2["w1"])/2
    ),
    'w2': tf.Variable(
        tf.add(weights["w2"], weights2["w2"])/2
    ),
    'w3': tf.Variable(
        tf.add(weights["w3"], weights2["w3"])/2
    )
}
# init biases
b = {
    'b1': tf.Variable(
        tf.add(biases["b1"], biases2["b1"])/2
    ),
    'b2': tf.Variable(
        tf.add(biases["b2"], biases2["b2"])/2
    ),
    'b3': tf.Variable(
        tf.add(biases["b3"], biases2["b3"])/2
    )
}

weights_saver = tf.train.Saver({
    'w1' : weights['w1'],
    'w2' : weights['w2'],
    'b1' : biases['b1'],
    'b2' : biases['b2']
    })
weights_saver2 = tf.train.Saver({
    'w1' : weights2['w1'],
    'w2' : weights2['w2'],
    'b1' : biases2['b1'],
    'b2' : biases2['b2']
    })

这是我在运行tf会话时想要得到的. c包含我要用来开始训练的权重.

And this what I am want to get when I run the tf session. c contains the weights I want to use in order to start the training.

# Create a session for running operations in the Graph.
init_op = tf.global_variables_initializer()
init_op2 = tf.local_variables_initializer()

with tf.Session() as sess:
    coord = tf.train.Coordinator()
    # Initialize the variables (like the epoch counter).
    sess.run(init_op)
    sess.run(init_op2)
    weights_saver.restore(
        sess,
        'my_model1/model_weights.ckpt'
    )
    weights_saver2.restore(
        sess,
        'my_model2/model_weights.ckpt'
    )
    a = sess.run(weights)
    b = sess.run(weights2)
    c = sess.run(w)

推荐答案

首先,我假设模型结构完全相同(相同的层数,相同的节点/层数).如果不是这样,您将无法映射变量(一个模型中有变量,而另一种模型中没有.

First, I assume the model structure is exactly the same (same number of layers, same number of nodes/layer). If not they you will have problems mapping variables (there will be variables in one model but not in the other.

您想要做的是3个会话.您从检查点加载的前2个,最后一个将保存平均值.您需要这样做,因为每个会话都将包含变量值的一个版本.

What you want to do is have 3 sessions. First 2 you load from checkpoints, the last one will hold the average. You want this because each session will contain a version of the values of the variables.

加载模型后,使用tf.trainable_variables()获取模型中所有变量的列表.您可以将其传递给sess.run以将变量获取为numpy数组.计算平均值后,使用tf.assign创建操作以更改变量.您还可以使用该列表来更改初始化程序,但这意味着传入模型(并非始终是一个选项).

After you load a model use tf.trainable_variables() to get a list of all the variables in the model. You can pass it to sess.run to get the variables as numpy arrays. After you compute the averages use tf.assign to create operations to change the variables. You can also use the list to change the initializers, but that means passing in to the model (not always an option).

大致:

graph = tf.Graph()
session1 = tf.Session()
session2 = tf.Session()
session3 = tf.Session()

# Omitted code: Restore session1 and session2.
# Optionally initialize session3.

all_vars = tf.trainable_variables()
values1 = session1.run(all_vars)
values2 = session2.run(all_vars)

all_assign = []
for var, val1, val2 in zip(all_vars, values1, values2):
  all_assign.append(tf.assign(var, tf.reduce_mean([val1,val2], axis=0)))

session3.run(all_assign)

# Do whatever you want with session 3.

这篇关于Tensorflow-从还原的模型中平均模型权重的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆