在 TensorFlow 中训练后变量未更新,即使使用统一随机启动进行简单的逻辑回归 [英] Variables not updated after training in TensorFlow even when initiated with uniform random for a simple logistic regression

查看:26
本文介绍了在 TensorFlow 中训练后变量未更新,即使使用统一随机启动进行简单的逻辑回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过实现一个简单的逻辑回归分类器来学习 TensorFlow,该分类器在输入 MNIST 图像时输出数字是否为 7.我正在使用随机梯度下降.Tensorflow 代码的关键是

I am learning TensorFlow by implementing a simple logisitic regression classifier that outputs whether a digit is 7 or not when fed an MNIST image. I am using Stochastic gradient descent. The crux of the Tensorflow code is

# Maximum number of epochs
MaxEpochs = 1
# Learning rate
eta = 1e-2

ops.reset_default_graph()                                       
n_x = 784
n_y = 1

x_tf = tf.placeholder(tf.float32, shape = [n_x, 1], name = 'x_tf')
y_tf = tf.placeholder(tf.float32, shape = [n_y, 1], name = 'y_tf')    

w_tf = tf.get_variable(name = "w_tf", shape = [n_x, 1], initializer = tf.initializers.random_uniform());
b_tf = tf.get_variable(name = "b_tf", shape = [n_y, 1], initializer = tf.initializers.random_uniform());

z_tf = tf.add(tf.matmul(w_tf, x_tf, transpose_a = True), b_tf, name = 'z_tf')
yPred_tf = tf.sigmoid(z_tf, name = 'yPred_tf')

Loss_tf = tf.nn.sigmoid_cross_entropy_with_logits(logits = yPred_tf, labels = y_tf, name = 'Loss_tf')
with tf.name_scope('Training'):
    optimizer_tf = tf.train.GradientDescentOptimizer(learning_rate = eta)
    train_step = optimizer_tf.minimize(Loss_tf)

init = tf.global_variables_initializer()                                                 

with tf.Session() as sess:
    sess.run(init)
    for Epoch in range(MaxEpochs):
        for Sample in range(len(XTrain)):
            x = XTrain[Sample]
            y = YTrain[Sample].reshape([-1,1])
            Train_sample = {x_tf: x, y_tf: y}
            sess.run(train_step, feed_dict = Train_sample)

toc = time.time()
print('\nElapsed time is: ', toc-tic,'s');    

它构建如下图(为方便起见,tensorboard相关代码已被删除):

It builds the following graph (tensorboard related code has been removed for convenience):

问题是即使权重和偏差是随机初始化的(非零),神经元也没有被训练.权重直方图如下.

The problem is even though the weights and biases are initialised randomly (non-zero), the neuron isn't being trained. The weight histogram is as follows.

我不想发布如此微不足道的东西,但我已经无能为力了.抱歉,帖子太长了.非常感谢您的任何指导.一个小小的旁注,运行需要 93.35 秒,当我使用 numpy(相同的随机实现)执行此操作时只需要 10 秒左右,为什么会这样?

I didnt want to post something so trivial, but I am at my wit's end. Sorry for the long post. Thank you very much in advance for any guidance. A little side note, it is taking 93.35s to run, it only took 10 or so seconds when I did this with numpy (same stochastic implementation), why would this be so?

训练过程中的偏差图如下.

The bias plot over the course of the training is as follows.

整个代码,如果问题出在我之前认为的之外.

The entire code, if the issue is cropping up on something outside what I previously thought.

import tensorflow as tf
import numpy as np
import h5py
from tensorflow.python.framework import ops
import time

mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()

def Flatten(Im):
    FlatImArray = Im.reshape([Im.shape[0],-1,1])
    return FlatImArray

DigitTested = 7

# Sperating the images with 7s from the rest
TrainIdxs = [];
for i in range(len(y_train)):
    if(y_train[i] == DigitTested):
        TrainIdxs.append(i)

TestIdxs = [];
for i in range(len(y_test)):
    if(y_test[i] == DigitTested):
        TestIdxs.append(i)

# Preparing the Datasets for training and testing
XTrain = Flatten(x_train);
YTrain = np.zeros([len(x_train),1]);
YTrain[TrainIdxs] = 1;

XTest = Flatten(x_test);
YTest = np.zeros([len(x_test),1]);
YTest[TestIdxs] = 1;

tic = time.time()
# Maximum number of epochs
MaxEpochs = 1
# Learning rate
eta = 1e-2
# Number of Epochs after which the neuron is validated 
ValidationInterval = 1

ops.reset_default_graph()                                       # to be able to rerun the model without overwriting tf variables
n_x = 784
n_y = 1

x_tf = tf.placeholder(tf.float32, shape = [n_x, 1], name = 'x_tf')
y_tf = tf.placeholder(tf.float32, shape = [n_y, 1], name = 'y_tf')    

w_tf = tf.get_variable(name = "w_tf", shape = [n_x, 1], initializer = tf.initializers.random_uniform());
b_tf = tf.get_variable(name = "b_tf", shape = [n_y, 1], initializer = tf.initializers.random_uniform());

z_tf = tf.add(tf.matmul(w_tf, x_tf, transpose_a = True), b_tf, name = 'z_tf')
yPred_tf = tf.sigmoid(z_tf, name = 'yPred_tf')

Loss_tf = tf.nn.sigmoid_cross_entropy_with_logits(logits = yPred_tf, labels = y_tf, name = 'Loss_tf')
with tf.name_scope('Training'):
    optimizer_tf = tf.train.GradientDescentOptimizer(learning_rate = eta)
    train_step = optimizer_tf.minimize(Loss_tf)


writer = tf.summary.FileWriter(r"C:\Users\braja\Documents\TBSummaries\MNIST1NTF\2")             
tf.summary.histogram('Weights', w_tf)
tf.summary.scalar('Loss', tf.reshape(Loss_tf, []))
tf.summary.scalar('Bias', tf.reshape(b_tf, []))
merged_summary = tf.summary.merge_all()

init = tf.global_variables_initializer()                                                       

with tf.Session() as sess:
    sess.run(init)
    for Epoch in range(MaxEpochs):
        for Sample in range(len(XTrain)):
            x = XTrain[Sample]
            y = YTrain[Sample].reshape([-1,1])
            Train_sample = {x_tf: x, y_tf: y}
            MergedSumm, _ = sess.run([merged_summary, train_step], feed_dict = Train_sample)
            writer.add_summary(summary = MergedSumm, global_step = Sample)
        if((Epoch+1) %ValidationInterval == 0):
            ValidationError = 0
            for Sample in range(len(XTest)):
                x = XTest[Sample]
                y = YTest[Sample].reshape([-1,1])
                Test_sample = {x_tf: x, y_tf: y}
                yPred = sess.run(yPred_tf, feed_dict = Test_sample)
                ValidationError += abs(yPred - YTest[Sample])
            print('Validation Error at', Epoch+1,'Epoch:', ValidationError);

writer.add_graph(tf.Session().graph)
writer.close()
toc = time.time()
print('\nElapsed time is: ', toc-tic,'s');    

推荐答案

查看偏置值,您似乎看到了 sigmoid 函数的饱和度.

Looking at the bias value it looks like you are seeing saturation of the sigmoid function.

当您将 sigmoid 输入(z_tf)推送到 sigmoid 函数的最末端时,就会发生这种情况.发生这种情况时,返回的梯度太低以至于训练停滞.造成这种情况的可能原因是您似乎在 sigmoid 函数上加倍了;sigmoid_cross_entropy_with_logits 将 sigmoid 应用于其输入,但您已经自己实现了.尝试删除其中之一.

This happens when you push your sigmoid input(z_tf) to the extreme ends of the sigmoid function. When this happens, the gradient returned is so low that the training stagnates. The probable cause of this is that it seems you have doubled up on sigmoid functions; sigmoid_cross_entropy_with_logits applies a sigmoid to its input, but you have implemented one yourself already. Try removing one of these.

此外,默认情况下 tf.initializers.random_uniform()) 生成 0:1 之间的随机值.您可能希望将权重和偏差对称地初始化为 0 并且以非常小的值开始.这可以通过将参数 minvalmaxval 传递给 tf.initializers.random_uniform() 来实现.

In addition, by default tf.initializers.random_uniform()) produces random values between 0:1. You probably want to initialise your Weights and biases symmetrically about 0 and at really small values to start with. This can be done by passing arguments minval and maxval to tf.initializers.random_uniform().

它们应该在训练期间增长,这再次防止了 sigmoid 饱和.

They should grow during training and again this prevents sigmoid saturation.

这篇关于在 TensorFlow 中训练后变量未更新,即使使用统一随机启动进行简单的逻辑回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆