张量流的AdamOptimizer和GradientDescentOptimizer无法拟合简单数据 [英] AdamOptimizer and GradientDescentOptimizer from tensorflow not able to fit simple data

查看:161
本文介绍了张量流的AdamOptimizer和GradientDescentOptimizer无法拟合简单数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

类似的问题:这里

我正在尝试TensorFlow.我生成了可线性分离的简单数据,并尝试将线性方程拟合到该数据.这是代码.

I am trying out TensorFlow. I generated simple data which is linearly separable and tried to fit a linear equation to it. Here is the code.

np.random.seed(2010)
n = 300
x_data = np.random.random([n, 2]).tolist()
y_data = [[1., 0.] if v[0]> 0.5 else [0., 1.] for v in x_data]

x = tf.placeholder(tf.float32, [None, 2]) 
W = tf.Variable(tf.zeros([2, 2]))
b = tf.Variable(tf.zeros([2]))
y = tf.sigmoid(tf.matmul(x , W) + b)

y_ = tf.placeholder(tf.float32, [None, 2]) 
cross_entropy = -tf.reduce_sum(y_ * tf.log(tf.clip_by_value(y, 1e-9, 1)))
train_step = tf.train.AdamOptimizer(0.01).minimize(cross_entropy)

correct_predict = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) 
accuracy = tf.reduce_mean(tf.cast(correct_predict, tf.float32))

s = tf.Session()
s.run(tf.initialize_all_variables())

for i in range(10):
        s.run(train_step, feed_dict = {x: x_data, y_: y_data})
        print(s.run(accuracy, feed_dict = {x: x_data, y_: y_data}))

print(s.run(accuracy, feed_dict = {x: x_data, y_: y_data}), end=",")

我得到以下输出:

0.536667、0.46、0.46、0.46、0.46、0.46、0.46、0.46、0.46、0.46、0.46

0.536667, 0.46, 0.46, 0.46, 0.46, 0.46, 0.46, 0.46, 0.46, 0.46, 0.46

在第一次迭代之后,它立即被击中0.46.

Right after the first iteration it gets struck at 0.46.

以下是情节:

然后我将代码更改为使用梯度下降:

Then I changed the code to use gradient descent:

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

现在我得到以下信息:0.54、0.54、0.63、0.70、0.75、0.8、0.84、0.89、0.92、0.94、0.94

Now i got the following: 0.54, 0.54, 0.63, 0.70, 0.75, 0.8, 0.84, 0.89, 0.92, 0.94, 0.94

以下是情节:

我的问题:

1)为什么AdamOptimizer失败了?

1) Why is the AdamOptimizer failing?

2)如果问题出在学习率或我需要调整的其他参数上,通常应如何调试它们?

2) If the issue is with learning rate, or other parameters which I need to tune, how do I generally debug them?

3)我进行了50次迭代的梯度下降(我进行了10次以上的迭代),并每5次迭代打印一次精度,这是输出:

3) I ran gradient descent for 50 iterations (I ran for 10 above) and printed the accuracy every 5 iterations and this is the output:

0.54,0.8,0.95,0.96,0.92,0.89,0.87,0.84,0.81,0.79,0.77.

0.54, 0.8, 0.95, 0.96, 0.92, 0.89, 0.87, 0.84, 0.81, 0.79, 0.77.

很明显,它开始出现分歧,看来问题在于固定的学习率(过了一个点,它就超调了).我说的对吗?

Clearly it started to diverge, looks like the issue is with fixed learning rate (it is overshooting after a point). Am I right?

4)在此玩具示例中,可以做些什么以获得更好的适合度.理想情况下,它应该具有1.0的精度,因为数据是线性可分离的.

4) In this toy example what can be done to get a better fit. Ideally it should have 1.0 accuracy as the data is linearly separable.

按照@Yaroslav的要求,这是用于绘图的代码

As requested by @Yaroslav, here is the code used for plots

xx = [v[0] for v in x_data]
yy = [v[1] for v in x_data]
x_min, x_max = min(xx) - 0.5, max(xx) + 0.5 
y_min, y_max = min(yy) - 0.5, max(yy) + 0.5 
xxx, yyy = np.meshgrid(np.arange(x_min, x_max, 0.02), np.arange(y_min, y_max, 0.02))
pts = np.c_[xxx.ravel(), yyy.ravel()].tolist()
# ---> Important
z = s.run(tf.argmax(y, 1), feed_dict = {x: pts})
z = np.array(z).reshape(xxx.shape)
plt.pcolormesh(xxx, yyy, z)
plt.scatter(xx, yy, c=['r' if v[0] == 1 else 'b' for v in y_data], edgecolor='k', s=50)
plt.show()

推荐答案

TLDR;你的损失是错误的.损失为零而不会降低精度.

TLDR; your loss is wrong. Loss goes to zero without decreasing accuracy.

问题是您的概率未标准化.如果看您的损失,损失正在下降,但是y[:0]y[:1]的概率都将变为1,因此argmax毫无意义.

The problem is that your probabilities are not normalized. If you look at your loss, it's going down, but probabilities for both y[:0] and y[:1] are going to 1, so argmax is meaningless.

传统解决方案是仅使用1个自由度而不是2个,因此您一等舱的概率为sigmoid(y),二等舱的概率为1-sigmoid(y),因此交叉熵类似于-y[0]log(sigmoid(y0)) - y[1]log(1-sigmoid(y0))

Traditional solution is to use only 1 degree of freedom instead of 2, so your probability for first class is sigmoid(y), and for second class it is 1-sigmoid(y) so cross entropy is something like -y[0]log(sigmoid(y0)) - y[1]log(1-sigmoid(y0))

或者,您可以更改代码是使用tf.nn.softmax而不是tf.sigmoid.这除以概率之和,因此优化器无法通过将两个概率同时驱动为1来减少损失.

Alternatively you could change your code is to use tf.nn.softmax instead of tf.sigmoid. This divides by the sum of the probabilities so the optimizer can't decrease loss by driving both probabilities to 1 simultaneously.

以下内容达到了0.99666673的准确性.

The following gets to 0.99666673 accuracy.

tf.reset_default_graph()
np.random.seed(2010)
n = 300
x_data = np.random.random([n, 2]).tolist()
y_data = [[1., 0.] if v[0]> 0.5 else [0., 1.] for v in x_data]

x = tf.placeholder(tf.float32, [None, 2]) 
W = tf.Variable(tf.zeros([2, 2]))
b = tf.Variable(tf.zeros([2]))
y = tf.nn.softmax(tf.matmul(x , W) + b)

y_ = tf.placeholder(tf.float32, [None, 2]) 
cross_entropy = -tf.reduce_sum(y_ * tf.log(y))
regularizer = tf.reduce_sum(tf.square(y))
train_step = tf.train.AdamOptimizer(1.0).minimize(cross_entropy+regularizer)

correct_predict = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) 
accuracy = tf.reduce_mean(tf.cast(correct_predict, tf.float32))

s = tf.Session()
s.run(tf.initialize_all_variables())

for i in range(30):
        s.run(train_step, feed_dict = {x: x_data, y_: y_data})
        cost1,cost2=s.run([cross_entropy,accuracy], feed_dict = {x: x_data, y_: y_data})
        print(cost1, cost2)

PS:您可以共享用于制作上面图的代码吗?

PS: can you share the code you used for making the plots above?

这篇关于张量流的AdamOptimizer和GradientDescentOptimizer无法拟合简单数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆