使用TensorFlow的训练和预测有什么问题? [英] What is going wrong with the training and predictions using TensorFlow?

查看:816
本文介绍了使用TensorFlow的训练和预测有什么问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请参阅下面的代码。

  x = tf.placeholder(float,[None,80]) 
W = tf.Variable(tf.zeros([80,2]))
b = tf.Variable(tf.zeros([2]))

y = tf。 nn.softmax(tf.matmul(x,W)+ b)

y_ = tf.placeholder(float,[None,2])



因此,我们看到数据中有80个特征,只有2个可能的输出。我设置 cross_entropy train_step 这样。

  cross_entropy = tf.nn.softmax_cross_entropy_with_logits(tf.matmul(x,W)+ b,y_)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

初始化所有变量。

  init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

然后我使用这个代码来训练我的神经网络。

  g = 0 
for i in range(len(x_train)):

_,w_out,b_out = sess.run([train_step,W,b],feed_dict = {x: [x_train [g]],y_:[y_train [g]]})

g + = 1

print...训练...

训练网络后,无论我训练多少次,它总是产生相同的精确率。该准确率为 0.856067 ,并且使用此代码获得准确率 -

  correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,float))
print sess.run(accuracy,feed_dict = {x:x_test,y_:y_test})
0.856067


$ b b

这就是问题所在,是因为我的尺寸太小了吗?也许我应该把功能分成一个10x8的矩阵?也许是一个4x20矩阵?



然后我尝试获得实际测试数据产生0或1的概率,如 -

  test_data_actual = genfromtxt('clean-test-actual.csv',delimiter =',')#实际测试数据

x_test_actual = []
for i in test_data_actual:
x_test_actual.append(i)
x_test_actual = np.array(x_test_actual)

ans = sess.run(y,feed_dict = {x: x_test_actual})

并打印出概率:

  print ans [0:10] 
[[1. 0.]
[1. 0.]
[1. 0.]
[1. 0.]
[1. 0.]
[1. 0.]
[1. 0.]
[1. 0.]
[1. 0.]]

(注意:它有时会产生 [0.1。] 。)



专家方法将产生更好的结果。请参阅以下代码。

  def weight_variable(shape):
initial = tf.truncated_normal(shape,stddev = 0.1)
return tf.Variable(initial)

def bias_variable(shape):
initial = tf.constant(0.1,shape = shape)
return tf。变量(初始)

def conv2d(x,W):
return tf.nn.conv2d(x,W,strides = [1,1,1,1],padding = SAME')

def max_pool_2x2(x):
return tf.nn.max_pool(x,ksize = [1,1,1,1],
strides = [1 ,1,1,1],padding ='SAME')

strides 以避免错误)。

  W_conv1 = weight_variable 1,80,1,1])$ ​​b $ b b_conv1 = bias_variable([1])$ ​​b $ b  

这里是问题再次出现的地方。我定义Tensor(向量/矩阵,如果你愿意)作为80x1(所以1行有80个功能);我继续在其余的代码(请参见下面)。

  x_ = tf.reshape(x, -1,1,80,1])$ ​​b $ b h_conv1 = tf.nn.relu(conv2d(x_,W_conv1)+ b_conv1)

第二卷积层

  h_pool1 = max_pool_2x2(h_conv1) 
W_conv2 = weight_variable([1,80,1,1])$ ​​b $ b b_conv2 = bias_variable([1])$ ​​b
$ b h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2)+ b_conv2)
h_pool2 = max_pool_2x2(h_conv2)



  W_fc1 = weight_variable([80,1024])
b_fc1 = bias_variable([1024]

h_pool2_flat = tf.reshape(h_pool2,[-1,80])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat,W_fc1)+ b_fc1)

退出

  keep_prob = tf.placeholder(float)
h_fc1_drop = tf.nn.dropout(h_fc1,keep_prob)

读取

  W_fc2 = weight_variable ,2])
b_fc2 = bias_variable([2])

y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop,W_fc2)+ b_fc2)

在上面你会看到我定义输出为2可能的答案(也避免错误)。



然后 cross_entropy train_step

  cross_entropy = tf.nn.softmax_cross_entropy_with_logits(tf.matmul(h_fc1_drop,W_fc2)+ b_fc2,y_)

train_step = tf.train .AdamOptimizer(1e-4).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y_conv,1),tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,float))

p>

  sess.run(tf.initialize_all_variables())

训练神经网络。

  g = 0 
b $ b for i in range(len(x_train)):
如果i%100 == 0:
train_accuracy = accuracy.eval(session = sess,feed_dict = {x:[x_train [g] ],y_:[y_train [g]],keep_prob:1.0})

train_step.run(session = sess,feed_dict = {x:[x_train [g]],y_:[y_train [g ]],keep_prob:0.5})

g + = 1

printtest accuracy%g%accuracy.eval(session = sess,feed_dict = {
x :x_test,y_:y_test,keep_prob:1.0})
测试精度0.929267

再次,它总是产生 0.929267 作为输出。



实际数据产生0或1如下:

  [[0.92820859 0.07179145] 
[0.92820859 0.07179145]
[0.92820859 0.07179145]
[0.92820859 0.07179145]
[0.92820859 0.07179145]
[0.92820859 0.07179145]
[0.96712834 0.03287172]
[0.92820859 0.07179145]
[0.92820859 0.07179145]
[0.92820859 0.07179145]]

正如您所看到的,有一些这些概率的方差,但通常只是相同的结果。



我知道这不是一个深度学习问题。这显然是一个训练问题。我知道,每次重新初始化变量和重新训练网络时,训练准确度应该总是有一些差异,但我只是不知道为什么或在哪里出错。



一个问题是尺寸/参数。另一个问题是这些特征被置于错误的位置。

  W_conv1 = weight_variable([ 80])
b_conv1 = bias_variable([80])

注意前两个数字 weight_variable 对应于输入的维度。后两个数字对应于特征张量的维数。 bias_variable 始终采用 weight_variable 中的最终数字。



第二卷积层

  W_conv2 = weight_variable([1,2,80,160] 
b_conv2 = bias_variable([160])

这里前两个数字仍然对应于输入的尺寸。后两个数字对应于从80个先前特征得到的特征和加权网络的量。在这种情况下,我们加倍加权网络。 80x2 = 160。 bias_variable 然后获取 weight_variable 中的最终数字。如果此时完成代码,则 weight_variable 中的最后一个数字将为1,以防止由于输入张量的形状和输出张量。



第三卷积层

  W_conv3 = weight_variable([1,2,160,1])$ ​​b $ b b_conv3 = bias_variable([1])$ ​​b $ b  

再次, weight_variable 中的前两个数字采用输入的形状。第三个数字对应于我们在第二卷积层中建立的加权变量的数量。 weight_variable 中的最后一个数字现在变为1,所以我们不会遇到我们预测的输出上的任何尺寸误差。在这种情况下,输出的大小为 1,2

  W_fc2 = weight_variable([80,1024])
b_fc2 = bias_variable([1024])

这里,神经元的数量是 1024 ,这是完全任意的,但是 weight_variable 中的第一个数字需要是我们的特征矩阵的维度需要被可整除的东西。在这种情况下,它可以是任何数字(例如 2,4,10,20,40,80 )。再次, bias_variable 采用 weight_variable 中的最后一个数字。



此时,请确保 h_pool3_flat = tf.reshape(h_pool3,[-1,80])中的最后一个数字对应于 W_fc2 weight_variable



程序,你会注意到结果不一样,并不总是猜到所有的1或所有的0。



当你想预测概率时,你必须输入 x 到 softmax variable-> y_conv = tf.nn.softmax(tf.matmul(h_fc2_drop, W_fc3)+ b_fc3) like so -

  ans = sess.run(y_conv,feed_dict = { x:x_test_actual,keep_prob:1.0})

你可以改变 keep_prob 变量,但保持在 1.0 总是产生最好的结果。现在,如果你打印出 ans ,你会有这样的东西 -

  [[0.90855026 0.09144982] 
[0.93020624 0.06979381]
[0.98385173 0.0161483]
[0.93948185 0.06051811]
[0.90705943 0.09294061]
[0.95702559 0.04297439]
[0.95543593 0.04456403]
[0.95944828 0.0405517]
[0.99154049 0.00845954]
[0.84375167 0.1562483]
[0.98449463 0.01550537]
[0.97772813 0.02227189]
[0.98341942 0.01658053]
[0.93026513 0.06973486]
[0.93376994 0.06623009]
[0.98026556 0.01973441]
[0.93210858 0.06789146]
pre>

注意概率的变化。您的培训现在正常工作。


Please see the code written below.

x = tf.placeholder("float", [None, 80])
W = tf.Variable(tf.zeros([80,2]))
b = tf.Variable(tf.zeros([2]))

y = tf.nn.softmax(tf.matmul(x,W) + b)

y_ = tf.placeholder("float", [None,2])

So here we see that there are 80 features in the data with only 2 possible outputs. I set the cross_entropy and the train_step like so.

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(tf.matmul(x, W) + b, y_)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

Initialize all variables.

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

Then I use this code to "train" my Neural Network.

g = 0
for i in range(len(x_train)):

    _, w_out, b_out = sess.run([train_step, W, b], feed_dict={x: [x_train[g]], y_: [y_train[g]]})

    g += 1

print "...Trained..."

After training the network, it always produces the same accuracy rate regardless of how many times I train it. That accuracy rate is 0.856067 and I get to that accuracy with this code-

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print sess.run(accuracy, feed_dict={x: x_test, y_: y_test})
0.856067

So this is where the question comes in. Is it because I have too small of dimensions? Maybe I should break the features into a 10x8 matrix? Maybe a 4x20 matrix? etc.

Then I try to get the probabilities of the actual test data producing a 0 or a 1 like so-

test_data_actual = genfromtxt('clean-test-actual.csv',delimiter=',')  # Actual Test data

x_test_actual = []
for i in test_data_actual:
    x_test_actual.append(i)
x_test_actual = np.array(x_test_actual)

ans = sess.run(y, feed_dict={x: x_test_actual})

And print out the probabilities:

print ans[0:10]
[[ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]]

(Note: it does produce [ 0. 1.] sometimes.)

I then tried to see if applying the expert methodology would produce better results. Please see the following code.

def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 1, 1, 1],
                        strides=[1, 1, 1, 1], padding='SAME')

(Please note how I changed the strides in order to avoid errors).

W_conv1 = weight_variable([1, 80, 1, 1])
b_conv1 = bias_variable([1])

Here is where the question comes in again. I define the Tensor (vector/matrix if you will) as 80x1 (so 1 row with 80 features in it); I continue to do that throughout the rest of the code (please see below).

x_ = tf.reshape(x, [-1,1,80,1])
h_conv1 = tf.nn.relu(conv2d(x_, W_conv1) + b_conv1)

Second Convolutional Layer

h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([1, 80, 1, 1])
b_conv2 = bias_variable([1])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

Densely Connected Layer

W_fc1 = weight_variable([80, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 80])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

Dropout

keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

Readout

W_fc2 = weight_variable([1024, 2])
b_fc2 = bias_variable([2])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

In the above you'll see that I defined the output as 2 possible answers (also to avoid errors).

Then cross_entropy and the train_step.

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(tf.matmul(h_fc1_drop, W_fc2) + b_fc2, y_)

train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

Start the session.

sess.run(tf.initialize_all_variables())

"Train" the neural network.

g = 0

for i in range(len(x_train)):
    if i%100 == 0:
        train_accuracy = accuracy.eval(session=sess, feed_dict={x: [x_train[g]], y_: [y_train[g]], keep_prob: 1.0})

    train_step.run(session=sess, feed_dict={x: [x_train[g]], y_: [y_train[g]], keep_prob: 0.5})

    g += 1

print "test accuracy %g"%accuracy.eval(session=sess, feed_dict={
    x: x_test, y_: y_test, keep_prob: 1.0})
test accuracy 0.929267

And, once again, it always produces 0.929267 as the output.

The probabilities on the actual data producing a 0 or a 1 are as follows:

[[ 0.92820859  0.07179145]
 [ 0.92820859  0.07179145]
 [ 0.92820859  0.07179145]
 [ 0.92820859  0.07179145]
 [ 0.92820859  0.07179145]
 [ 0.92820859  0.07179145]
 [ 0.96712834  0.03287172]
 [ 0.92820859  0.07179145]
 [ 0.92820859  0.07179145]
 [ 0.92820859  0.07179145]]

As you see, there is some variance in these probabilities, but typically just the same result.

I know that this isn't a Deep Learning problem. This is obviously a training problem. I know that there should always be some variance in the training accuracy every time you reinitialize the variables and retrain the network, but I just don't know why or where it's going wrong.

解决方案

The answer is 2 fold.

One problem is with the dimensions/parameters. The other problem is that the features are being placed in the wrong spot.

W_conv1 = weight_variable([1, 2, 1, 80])
b_conv1 = bias_variable([80])

Notice the first two numbers in the weight_variable correspond to the dimensions of the input. The second two numbers correspond to the dimensions of the feature tensor. The bias_variable always takes the final number in the weight_variable.

Second Convolutional Layer

W_conv2 = weight_variable([1, 2, 80, 160])
b_conv2 = bias_variable([160])

Here the first two numbers still correspond to the dimensions of the input. The second two numbers correspond to the amount of features and the weighted network that results from the 80 previous features. In this case, we double the weighted network. 80x2=160. The bias_variable then takes the final number in the weight_variable. If you were to finish the code at this point, the last number in the weight_variable would be a 1 in order to prevent dimensional errors due to the shape of the input tensor and the output tensor. But, instead, for better predictions, let's add a third convolutional layer.

Third Convolutional Layer

W_conv3 = weight_variable([1, 2, 160, 1])
b_conv3 = bias_variable([1])

Once again, the first two numbers in the weight_variable take the shape of the input. The third number corresponds to the amount of the weighted variables we established in the Second Convolutional Layer. The last number in the weight_variable now becomes 1 so we don't run into any dimension errors on the output that we are predicting. In this case, the output has the dimensions of 1, 2.

W_fc2 = weight_variable([80, 1024])
b_fc2 = bias_variable([1024])

Here, the number of neurons is 1024 which is completely arbitrary, but the first number in the weight_variable needs to be something that the dimensions of our feature matrix needs to be divisible by. In this case it can be any number (such as 2, 4, 10, 20, 40, 80). Once again, the bias_variable takes the last number in the weight_variable.

At this point, make sure that the last number in h_pool3_flat = tf.reshape(h_pool3, [-1, 80]) corresponds to the first number in the W_fc2 weight_variable.

Now when you run your training program you will notice that the outcome varies and won't always guess all 1's or all 0's.

When you want to predict the probabilities, you have to feed x to the softmax variable-> y_conv=tf.nn.softmax(tf.matmul(h_fc2_drop, W_fc3) + b_fc3) like so-

ans = sess.run(y_conv, feed_dict={x: x_test_actual, keep_prob: 1.0})

You can alter the keep_prob variable, but keeping it at a 1.0 always produces the best results. Now, if you print out ans you'll have something that looks like this-

[[ 0.90855026  0.09144982]
 [ 0.93020624  0.06979381]
 [ 0.98385173  0.0161483 ]
 [ 0.93948185  0.06051811]
 [ 0.90705943  0.09294061]
 [ 0.95702559  0.04297439]
 [ 0.95543593  0.04456403]
 [ 0.95944828  0.0405517 ]
 [ 0.99154049  0.00845954]
 [ 0.84375167  0.1562483 ]
 [ 0.98449463  0.01550537]
 [ 0.97772813  0.02227189]
 [ 0.98341942  0.01658053]
 [ 0.93026513  0.06973486]
 [ 0.93376994  0.06623009]
 [ 0.98026556  0.01973441]
 [ 0.93210858  0.06789146]

Notice how the probabilities vary. Your training is now working properly.

这篇关于使用TensorFlow的训练和预测有什么问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆