为什么这种TensorFlow实施远没有Matlab的NN成功呢? [英] Why is this TensorFlow implementation vastly less successful than Matlab's NN?

查看:148
本文介绍了为什么这种TensorFlow实施远没有Matlab的NN成功呢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为一个玩具示例,我试图从100个无噪声数据点拟合函数f(x) = 1/x. Matlab默认实现非常成功,均方差约为10 ^ -10,并且插值完美.

As a toy example I'm trying to fit a function f(x) = 1/x from 100 no-noise data points. The matlab default implementation is phenomenally successful with mean square difference ~10^-10, and interpolates perfectly.

我实现了一个神经网络,该网络具有10个S形神经元的一个隐藏层.我是神经网络的初学者,所以请当心防止愚蠢的代码.

I implement a neural network with one hidden layer of 10 sigmoid neurons. I'm a beginner at neural networks so be on your guard against dumb code.

import tensorflow as tf
import numpy as np

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

#Can't make tensorflow consume ordinary lists unless they're parsed to ndarray
def toNd(lst):
    lgt = len(lst)
    x = np.zeros((1, lgt), dtype='float32')
    for i in range(0, lgt):
        x[0,i] = lst[i]
    return x

xBasic = np.linspace(0.2, 0.8, 101)
xTrain = toNd(xBasic)
yTrain = toNd(map(lambda x: 1/x, xBasic))

x = tf.placeholder("float", [1,None])
hiddenDim = 10

b = bias_variable([hiddenDim,1])
W = weight_variable([hiddenDim, 1])

b2 = bias_variable([1])
W2 = weight_variable([1, hiddenDim])

hidden = tf.nn.sigmoid(tf.matmul(W, x) + b)
y = tf.matmul(W2, hidden) + b2

# Minimize the squared errors.
loss = tf.reduce_mean(tf.square(y - yTrain))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

# For initializing the variables.
init = tf.initialize_all_variables()

# Launch the graph
sess = tf.Session()
sess.run(init)

for step in xrange(0, 4001):
    train.run({x: xTrain}, sess)
    if step % 500 == 0:
        print loss.eval({x: xTrain}, sess)

均方差结束于〜2 * 10 ^ -3,因此比matlab差约7个数量级.可视化

Mean square difference ends at ~2*10^-3, so about 7 orders of magnitude worse than matlab. Visualising with

xTest = np.linspace(0.2, 0.8, 1001)
yTest = y.eval({x:toNd(xTest)}, sess)  
import matplotlib.pyplot as plt
plt.plot(xTest,yTest.transpose().tolist())
plt.plot(xTest,map(lambda x: 1/x, xTest))
plt.show()

我们可以看到拟合度在系统上是不完美的: 而Matlab肉眼看起来完美,差异均匀地< 10 ^ -5: 我试图用TensorFlow复制Matlab网络图:

we can see the fit is systematically imperfect: while the matlab one looks perfect to the naked eye with the differences uniformly < 10^-5: I have tried to replicate with TensorFlow the diagram of the Matlab network:

顺便说一句,该图似乎暗示了tanh而不是S形激活函数.我不能确定它在文档中的任何地方.但是,当我尝试在TensorFlow中使用tanh神经元时,拟合很快会失败,并以nan表示变量.我不知道为什么.

Incidentally, the diagram seems to imply a tanh rather than sigmoid activation function. I cannot find it anywhere in documentation to be sure. However, when I try to use a tanh neuron in TensorFlow the fitting quickly fails with nan for variables. I do not know why.

Matlab使用Levenberg–Marquardt训练算法.贝叶斯正则化在均方根为10 ^ -12时更为成功(我们可能在浮点运算的蒸气区域).

Matlab uses Levenberg–Marquardt training algorithm. Bayesian regularization is even more successful with mean squares at 10^-12 (we are probably in the area of vapours of float arithmetic).

为什么TensorFlow的实现如此糟糕,我该怎么做才能使其变得更好?

Why is TensorFlow implementation so much worse, and what can I do to make it better?

推荐答案

我尝试训练50000次迭代,结果出错为0.00012.在Tesla K40上大约需要180秒.

I tried training for 50000 iterations it got to 0.00012 error. It takes about 180 seconds on Tesla K40.

对于这种问题,一阶梯度下降似乎不是一个很好的选择(双关语意),您需要Levenberg–Marquardt或l-BFGS.我认为还没有人在TensorFlow中实现它们.

It seems that for this kind of problem, first order gradient descent is not a good fit (pun intended), and you need Levenberg–Marquardt or l-BFGS. I don't think anyone implemented them in TensorFlow yet.

修改 使用tf.train.AdamOptimizer(0.1)可以解决此问题.经过4000次迭代后,它变为3.13729e-05.此外,采用默认策略的GPU似乎对于此问题也是个坏主意.有许多小的操作,开销导致GPU版本的运行速度比我计算机上的CPU慢3倍.

Edit Use tf.train.AdamOptimizer(0.1) for this problem. It gets to 3.13729e-05 after 4000 iterations. Also, GPU with default strategy also seems like a bad idea for this problem. There are many small operations and the overhead causes GPU version to run 3x slower than CPU on my machine.

这篇关于为什么这种TensorFlow实施远没有Matlab的NN成功呢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆