如何在Tensorflow中获得可再现的结果 [英] how to get reproducible result in Tensorflow
问题描述
我使用张量流构建了5层神经网络.
I built 5-layer neural network by using tensorflow.
我很难获得可重复的结果(或稳定的结果).
I have a problem to get reproducible results (or stable results).
I found similar questions regarding reproducibility of tensorflow and the corresponding answers, such as How to get stable results with TensorFlow, setting random seed
但是问题尚未解决.
我也像下面一样设置随机种子
I also set random seed like the following
tf.set_random_seed(1)
此外,我向每个随机函数(例如,
Furthermore, I added seed options to every random function such as
b1 = tf.Variable(tf.random_normal([nHidden1], seed=1234))
我确认第一个时期显示出相同的结果,但与第二个时期渐渐不同.
I confirmed that the first epoch shows the identical results, but not identical from the second epoch little by little.
如何获得可重复的结果?
How can I get the reproducible results?
我想念什么吗?
这是我使用的代码块.
Here is a code block I use.
def xavier_init(n_inputs, n_outputs, uniform=True):
if uniform:
init_range = tf.sqrt(6.0 / (n_inputs + n_outputs))
return tf.random_uniform_initializer(-init_range, init_range, seed=1234)
else:
stddev = tf.sqrt(3.0 / (n_inputs + n_outputs))
return tf.truncated_normal_initializer(stddev=stddev, seed=1234)
import numpy as np
import tensorflow as tf
import dataSetup
from scipy.stats.stats import pearsonr
tf.set_random_seed(1)
x_train, y_train, x_test, y_test = dataSetup.input_data()
# Parameters
learningRate = 0.01
trainingEpochs = 1000000
batchSize = 64
displayStep = 100
thresholdReduce = 1e-6
thresholdNow = 0.6
#dropoutRate = tf.constant(0.7)
# Network Parameter
nHidden1 = 128 # number of 1st layer nodes
nHidden2 = 64 # number of 2nd layer nodes
nInput = 24 #
nOutput = 1 # Predicted score: 1 output for regression
# save parameter
modelPath = 'model/model_layer5_%d_%d_mini%d_lr%.3f_noDrop_rollBack.ckpt' %(nHidden1, nHidden2, batchSize, learningRate)
# tf Graph input
X = tf.placeholder("float", [None, nInput])
Y = tf.placeholder("float", [None, nOutput])
# Weight
W1 = tf.get_variable("W1", shape=[nInput, nHidden1], initializer=xavier_init(nInput, nHidden1))
W2 = tf.get_variable("W2", shape=[nHidden1, nHidden2], initializer=xavier_init(nHidden1, nHidden2))
W3 = tf.get_variable("W3", shape=[nHidden2, nHidden2], initializer=xavier_init(nHidden2, nHidden2))
W4 = tf.get_variable("W4", shape=[nHidden2, nHidden2], initializer=xavier_init(nHidden2, nHidden2))
WFinal = tf.get_variable("WFinal", shape=[nHidden2, nOutput], initializer=xavier_init(nHidden2, nOutput))
# biases
b1 = tf.Variable(tf.random_normal([nHidden1], seed=1234))
b2 = tf.Variable(tf.random_normal([nHidden2], seed=1234))
b3 = tf.Variable(tf.random_normal([nHidden2], seed=1234))
b4 = tf.Variable(tf.random_normal([nHidden2], seed=1234))
bFinal = tf.Variable(tf.random_normal([nOutput], seed=1234))
# Layers for dropout
L1 = tf.nn.relu(tf.add(tf.matmul(X, W1), b1))
L2 = tf.nn.relu(tf.add(tf.matmul(L1, W2), b2))
L3 = tf.nn.relu(tf.add(tf.matmul(L2, W3), b3))
L4 = tf.nn.relu(tf.add(tf.matmul(L3, W4), b4))
hypothesis = tf.add(tf.matmul(L4, WFinal), bFinal)
print "Layer setting DONE..."
# define loss and optimizer
cost = tf.reduce_mean(tf.square(hypothesis - Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learningRate).minimize(cost)
# Initialize the variable
init = tf.initialize_all_variables()
# save op to save and restore all the variables
saver = tf.train.Saver()
with tf.Session() as sess:
# initialize
sess.run(init)
print "Initialize DONE..."
# Training
costPrevious = 100000000000000.0
best = float("INF")
totalBatch = int(len(x_train)/batchSize)
print "Total Batch: %d" %totalBatch
for epoch in range(trainingEpochs):
#print "EPOCH: %04d" %epoch
avgCost = 0.
for i in range(totalBatch):
np.random.seed(i+epoch)
randidx = np.random.randint(len(x_train), size=batchSize)
batch_xs = x_train[randidx,:]
batch_ys = y_train[randidx,:]
# Fit traiing using batch data
sess.run(optimizer, feed_dict={X:batch_xs, Y:batch_ys})
# compute average loss
avgCost += sess.run(cost, feed_dict={X:batch_xs, Y:batch_ys})/totalBatch
# compare the current cost and the previous
# if current cost > the previous
# just continue and make the learning rate half
#print "Cost: %1.8f --> %1.8f at epoch %05d" %(costPrevious, avgCost, epoch+1)
if avgCost > costPrevious + .5:
#sess.run(init)
load_path = saver.restore(sess, modelPath)
print "Cost increases at the epoch %05d" %(epoch+1)
print "Cost: %1.8f --> %1.8f" %(costPrevious, avgCost)
continue
costNow = avgCost
reduceCost = abs(costPrevious - costNow)
costPrevious = costNow
#Display logs per epoch step
if costNow < best:
best = costNow
bestMatch = sess.run(hypothesis, feed_dict={X:x_test})
# model save
save_path = saver.save(sess, modelPath)
if epoch % displayStep == 0:
print "step {}".format(epoch)
pearson = np.corrcoef(bestMatch.flatten(), y_test.flatten())
print 'train loss = {}, current loss = {}, test corrcoef={}'.format(best, costNow, pearson[0][1])
if reduceCost < thresholdReduce or costNow < thresholdNow:
print "Epoch: %04d, Cost: %.9f, Prev: %.9f, Reduce: %.9f" %(epoch+1, costNow, costPrevious, reduceCost)
break
print "Optimization Finished"
推荐答案
似乎您的结果可能无法重现,因为您每次都使用Saver从检查点写入/恢复? (即第二次运行代码时,不会使用随机种子初始化变量值,而是从先前的检查点还原变量值)
It seems that your results are perhaps not reproducible because you are using Saver to write/restore from checkpoint each time? (i.e. the second time that you run the code, the variable values aren't initialized using your random seed -- they are restored from your previous checkpoint)
请整理您的代码示例,使其仅包含再现不可复制性所需的代码.
Please trim down your code example to just the code necessary to reproduce irreproducibility.
这篇关于如何在Tensorflow中获得可再现的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!