使用TensorFlow的训练和预测有什么问题? [英] What is going wrong with the training and predictions using TensorFlow?
问题描述
请参阅下面的代码。
x = tf.placeholder(float,[None,80])
W = tf.Variable(tf.zeros([80,2]))
b = tf.Variable(tf.zeros([2]))
y = tf。 nn.softmax(tf.matmul(x,W)+ b)
y_ = tf.placeholder(float,[None,2])
因此,我们看到数据中有80个特征,只有2个可能的输出。我设置
cross_entropy
和train_step
这样。cross_entropy = tf.nn.softmax_cross_entropy_with_logits(tf.matmul(x,W)+ b,y_)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
初始化所有变量。
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
然后我使用这个代码来训练我的神经网络。
g = 0
for i in range(len(x_train)):
_,w_out,b_out = sess.run([train_step,W,b],feed_dict = {x: [x_train [g]],y_:[y_train [g]]})
g + = 1
print...训练...
训练网络后,无论我训练多少次,它总是产生相同的精确率。该准确率为
0.856067
,并且使用此代码获得准确率 -correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,float))
print sess.run(accuracy,feed_dict = {x:x_test,y_:y_test})
0.856067
$ b b这就是问题所在,是因为我的尺寸太小了吗?也许我应该把功能分成一个10x8的矩阵?也许是一个4x20矩阵?
然后我尝试获得实际测试数据产生0或1的概率,如 -
test_data_actual = genfromtxt('clean-test-actual.csv',delimiter =',')#实际测试数据
x_test_actual = []
for i in test_data_actual:
x_test_actual.append(i)
x_test_actual = np.array(x_test_actual)
ans = sess.run(y,feed_dict = {x: x_test_actual})
并打印出概率:
print ans [0:10]
[[1. 0.]
[1. 0.]
[1. 0.]
[1. 0.]
[1. 0.]
[1. 0.]
[1. 0.]
[1. 0.]
[1. 0.]]
(注意:它有时会产生
[0.1。]
。)
专家方法将产生更好的结果。请参阅以下代码。
def weight_variable(shape):
initial = tf.truncated_normal(shape,stddev = 0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1,shape = shape)
return tf。变量(初始)
def conv2d(x,W):
return tf.nn.conv2d(x,W,strides = [1,1,1,1],padding = SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x,ksize = [1,1,1,1],
strides = [1 ,1,1,1],padding ='SAME')
strides
以避免错误)。W_conv1 = weight_variable 1,80,1,1])$ b $ b b_conv1 = bias_variable([1])$ b $ b
这里是问题再次出现的地方。我定义Tensor(向量/矩阵,如果你愿意)作为80x1(所以1行有80个功能);我继续在其余的代码(请参见下面)。
x_ = tf.reshape(x, -1,1,80,1])$ b $ b h_conv1 = tf.nn.relu(conv2d(x_,W_conv1)+ b_conv1)
第二卷积层
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([1,80,1,1])$ b $ b b_conv2 = bias_variable([1])$ b
$ b h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2)+ b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
W_fc1 = weight_variable([80,1024])
b_fc1 = bias_variable([1024]
h_pool2_flat = tf.reshape(h_pool2,[-1,80])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat,W_fc1)+ b_fc1)
退出
keep_prob = tf.placeholder(float)
h_fc1_drop = tf.nn.dropout(h_fc1,keep_prob)
读取
W_fc2 = weight_variable ,2])
b_fc2 = bias_variable([2])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop,W_fc2)+ b_fc2)
在上面你会看到我定义输出为2可能的答案(也避免错误)。
然后
cross_entropy
和train_step
。cross_entropy = tf.nn.softmax_cross_entropy_with_logits(tf.matmul(h_fc1_drop,W_fc2)+ b_fc2,y_)
train_step = tf.train .AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1),tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,float))
p>
sess.run(tf.initialize_all_variables())
训练神经网络。
g = 0
b $ b for i in range(len(x_train)):
如果i%100 == 0:
train_accuracy = accuracy.eval(session = sess,feed_dict = {x:[x_train [g] ],y_:[y_train [g]],keep_prob:1.0})
train_step.run(session = sess,feed_dict = {x:[x_train [g]],y_:[y_train [g ]],keep_prob:0.5})
g + = 1
printtest accuracy%g%accuracy.eval(session = sess,feed_dict = {
x :x_test,y_:y_test,keep_prob:1.0})
测试精度0.929267
再次,它总是产生
0.929267
作为输出。
实际数据产生0或1如下:
[[0.92820859 0.07179145]
[0.92820859 0.07179145]
[0.92820859 0.07179145]
[0.92820859 0.07179145]
[0.92820859 0.07179145]
[0.92820859 0.07179145]
[0.96712834 0.03287172]
[0.92820859 0.07179145]
[0.92820859 0.07179145]
[0.92820859 0.07179145]]
正如您所看到的,有一些这些概率的方差,但通常只是相同的结果。
我知道这不是一个深度学习问题。这显然是一个训练问题。我知道,每次重新初始化变量和重新训练网络时,训练准确度应该总是有一些差异,但我只是不知道为什么或在哪里出错。
一个问题是尺寸/参数。另一个问题是这些特征被置于错误的位置。
W_conv1 = weight_variable([ 80])
b_conv1 = bias_variable([80])
注意前两个数字
weight_variable
对应于输入的维度。后两个数字对应于特征张量的维数。bias_variable
始终采用weight_variable
中的最终数字。
第二卷积层
W_conv2 = weight_variable([1,2,80,160]
b_conv2 = bias_variable([160])
这里前两个数字仍然对应于输入的尺寸。后两个数字对应于从80个先前特征得到的特征和加权网络的量。在这种情况下,我们加倍加权网络。 80x2 = 160。
bias_variable
然后获取weight_variable
中的最终数字。如果此时完成代码,则weight_variable
中的最后一个数字将为1,以防止由于输入张量的形状和输出张量。
第三卷积层
W_conv3 = weight_variable([1,2,160,1])$ b $ b b_conv3 = bias_variable([1])$ b $ b
再次,
weight_variable
中的前两个数字采用输入的形状。第三个数字对应于我们在第二卷积层中建立的加权变量的数量。weight_variable
中的最后一个数字现在变为1,所以我们不会遇到我们预测的输出上的任何尺寸误差。在这种情况下,输出的大小为1,2
。W_fc2 = weight_variable([80,1024])
b_fc2 = bias_variable([1024])
这里,神经元的数量是
1024
,这是完全任意的,但是weight_variable
中的第一个数字需要是我们的特征矩阵的维度需要被可整除的东西。在这种情况下,它可以是任何数字(例如2,4,10,20,40,80
)。再次,bias_variable
采用weight_variable
中的最后一个数字。
此时,请确保
h_pool3_flat = tf.reshape(h_pool3,[-1,80])
中的最后一个数字对应于W_fc2
weight_variable
。
程序,你会注意到结果不一样,并不总是猜到所有的1或所有的0。
当你想预测概率时,你必须输入
x 到
softmax
variable->y_conv = tf.nn.softmax(tf.matmul(h_fc2_drop, W_fc3)+ b_fc3)
like so -ans = sess.run(y_conv,feed_dict = { x:x_test_actual,keep_prob:1.0})
你可以改变
keep_prob
变量,但保持在1.0
总是产生最好的结果。现在,如果你打印出ans
,你会有这样的东西 -[[0.90855026 0.09144982]
pre>
[0.93020624 0.06979381]
[0.98385173 0.0161483]
[0.93948185 0.06051811]
[0.90705943 0.09294061]
[0.95702559 0.04297439]
[0.95543593 0.04456403]
[0.95944828 0.0405517]
[0.99154049 0.00845954]
[0.84375167 0.1562483]
[0.98449463 0.01550537]
[0.97772813 0.02227189]
[0.98341942 0.01658053]
[0.93026513 0.06973486]
[0.93376994 0.06623009]
[0.98026556 0.01973441]
[0.93210858 0.06789146]
注意概率的变化。您的培训现在正常工作。
Please see the code written below.
x = tf.placeholder("float", [None, 80]) W = tf.Variable(tf.zeros([80,2])) b = tf.Variable(tf.zeros([2])) y = tf.nn.softmax(tf.matmul(x,W) + b) y_ = tf.placeholder("float", [None,2])
So here we see that there are 80 features in the data with only 2 possible outputs. I set the
cross_entropy
and thetrain_step
like so.cross_entropy = tf.nn.softmax_cross_entropy_with_logits(tf.matmul(x, W) + b, y_) train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
Initialize all variables.
init = tf.initialize_all_variables() sess = tf.Session() sess.run(init)
Then I use this code to "train" my Neural Network.
g = 0 for i in range(len(x_train)): _, w_out, b_out = sess.run([train_step, W, b], feed_dict={x: [x_train[g]], y_: [y_train[g]]}) g += 1 print "...Trained..."
After training the network, it always produces the same accuracy rate regardless of how many times I train it. That accuracy rate is
0.856067
and I get to that accuracy with this code-correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) print sess.run(accuracy, feed_dict={x: x_test, y_: y_test}) 0.856067
So this is where the question comes in. Is it because I have too small of dimensions? Maybe I should break the features into a 10x8 matrix? Maybe a 4x20 matrix? etc.
Then I try to get the probabilities of the actual test data producing a 0 or a 1 like so-
test_data_actual = genfromtxt('clean-test-actual.csv',delimiter=',') # Actual Test data x_test_actual = [] for i in test_data_actual: x_test_actual.append(i) x_test_actual = np.array(x_test_actual) ans = sess.run(y, feed_dict={x: x_test_actual})
And print out the probabilities:
print ans[0:10] [[ 1. 0.] [ 1. 0.] [ 1. 0.] [ 1. 0.] [ 1. 0.] [ 1. 0.] [ 1. 0.] [ 1. 0.] [ 1. 0.] [ 1. 0.]]
(Note: it does produce
[ 0. 1.]
sometimes.)I then tried to see if applying the expert methodology would produce better results. Please see the following code.
def weight_variable(shape): initial = tf.truncated_normal(shape, stddev=0.1) return tf.Variable(initial) def bias_variable(shape): initial = tf.constant(0.1, shape=shape) return tf.Variable(initial) def conv2d(x, W): return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') def max_pool_2x2(x): return tf.nn.max_pool(x, ksize=[1, 1, 1, 1], strides=[1, 1, 1, 1], padding='SAME')
(Please note how I changed the
strides
in order to avoid errors).W_conv1 = weight_variable([1, 80, 1, 1]) b_conv1 = bias_variable([1])
Here is where the question comes in again. I define the Tensor (vector/matrix if you will) as 80x1 (so 1 row with 80 features in it); I continue to do that throughout the rest of the code (please see below).
x_ = tf.reshape(x, [-1,1,80,1]) h_conv1 = tf.nn.relu(conv2d(x_, W_conv1) + b_conv1)
Second Convolutional Layer
h_pool1 = max_pool_2x2(h_conv1) W_conv2 = weight_variable([1, 80, 1, 1]) b_conv2 = bias_variable([1]) h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) h_pool2 = max_pool_2x2(h_conv2)
Densely Connected Layer
W_fc1 = weight_variable([80, 1024]) b_fc1 = bias_variable([1024]) h_pool2_flat = tf.reshape(h_pool2, [-1, 80]) h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
Dropout
keep_prob = tf.placeholder("float") h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
Readout
W_fc2 = weight_variable([1024, 2]) b_fc2 = bias_variable([2]) y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
In the above you'll see that I defined the output as 2 possible answers (also to avoid errors).
Then
cross_entropy
and thetrain_step
.cross_entropy = tf.nn.softmax_cross_entropy_with_logits(tf.matmul(h_fc1_drop, W_fc2) + b_fc2, y_) train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
Start the session.
sess.run(tf.initialize_all_variables())
"Train" the neural network.
g = 0 for i in range(len(x_train)): if i%100 == 0: train_accuracy = accuracy.eval(session=sess, feed_dict={x: [x_train[g]], y_: [y_train[g]], keep_prob: 1.0}) train_step.run(session=sess, feed_dict={x: [x_train[g]], y_: [y_train[g]], keep_prob: 0.5}) g += 1 print "test accuracy %g"%accuracy.eval(session=sess, feed_dict={ x: x_test, y_: y_test, keep_prob: 1.0}) test accuracy 0.929267
And, once again, it always produces
0.929267
as the output.The probabilities on the actual data producing a 0 or a 1 are as follows:
[[ 0.92820859 0.07179145] [ 0.92820859 0.07179145] [ 0.92820859 0.07179145] [ 0.92820859 0.07179145] [ 0.92820859 0.07179145] [ 0.92820859 0.07179145] [ 0.96712834 0.03287172] [ 0.92820859 0.07179145] [ 0.92820859 0.07179145] [ 0.92820859 0.07179145]]
As you see, there is some variance in these probabilities, but typically just the same result.
I know that this isn't a Deep Learning problem. This is obviously a training problem. I know that there should always be some variance in the training accuracy every time you reinitialize the variables and retrain the network, but I just don't know why or where it's going wrong.
解决方案The answer is 2 fold.
One problem is with the dimensions/parameters. The other problem is that the features are being placed in the wrong spot.
W_conv1 = weight_variable([1, 2, 1, 80]) b_conv1 = bias_variable([80])
Notice the first two numbers in the
weight_variable
correspond to the dimensions of the input. The second two numbers correspond to the dimensions of the feature tensor. Thebias_variable
always takes the final number in theweight_variable
.Second Convolutional Layer
W_conv2 = weight_variable([1, 2, 80, 160]) b_conv2 = bias_variable([160])
Here the first two numbers still correspond to the dimensions of the input. The second two numbers correspond to the amount of features and the weighted network that results from the 80 previous features. In this case, we double the weighted network. 80x2=160. The
bias_variable
then takes the final number in theweight_variable
. If you were to finish the code at this point, the last number in theweight_variable
would be a 1 in order to prevent dimensional errors due to the shape of the input tensor and the output tensor. But, instead, for better predictions, let's add a third convolutional layer.Third Convolutional Layer
W_conv3 = weight_variable([1, 2, 160, 1]) b_conv3 = bias_variable([1])
Once again, the first two numbers in the
weight_variable
take the shape of the input. The third number corresponds to the amount of the weighted variables we established in the Second Convolutional Layer. The last number in theweight_variable
now becomes 1 so we don't run into any dimension errors on the output that we are predicting. In this case, the output has the dimensions of1, 2
.W_fc2 = weight_variable([80, 1024]) b_fc2 = bias_variable([1024])
Here, the number of neurons is
1024
which is completely arbitrary, but the first number in theweight_variable
needs to be something that the dimensions of our feature matrix needs to be divisible by. In this case it can be any number (such as2, 4, 10, 20, 40, 80
). Once again, thebias_variable
takes the last number in theweight_variable
.At this point, make sure that the last number in
h_pool3_flat = tf.reshape(h_pool3, [-1, 80])
corresponds to the first number in theW_fc2
weight_variable
.Now when you run your training program you will notice that the outcome varies and won't always guess all 1's or all 0's.
When you want to predict the probabilities, you have to feed
x
to thesoftmax
variable->y_conv=tf.nn.softmax(tf.matmul(h_fc2_drop, W_fc3) + b_fc3)
like so-ans = sess.run(y_conv, feed_dict={x: x_test_actual, keep_prob: 1.0})
You can alter the
keep_prob
variable, but keeping it at a1.0
always produces the best results. Now, if you print outans
you'll have something that looks like this-[[ 0.90855026 0.09144982] [ 0.93020624 0.06979381] [ 0.98385173 0.0161483 ] [ 0.93948185 0.06051811] [ 0.90705943 0.09294061] [ 0.95702559 0.04297439] [ 0.95543593 0.04456403] [ 0.95944828 0.0405517 ] [ 0.99154049 0.00845954] [ 0.84375167 0.1562483 ] [ 0.98449463 0.01550537] [ 0.97772813 0.02227189] [ 0.98341942 0.01658053] [ 0.93026513 0.06973486] [ 0.93376994 0.06623009] [ 0.98026556 0.01973441] [ 0.93210858 0.06789146]
Notice how the probabilities vary. Your training is now working properly.
这篇关于使用TensorFlow的训练和预测有什么问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文