使用带有数据集类的TensorFlow提升到正方形 [英] Raising to a square with TensorFlow with a Dataset class

查看:92
本文介绍了使用带有数据集类的TensorFlow提升到正方形的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想编写一个神经网络,它在没有预定义模型的情况下寻找x ^ 2分布.精确地,在[-1,1]中给定一些点,并对其正方形进行训练,然后必须对例如点进行重现和预测. [-10,10]. 我或多或少地做到了-没有数据集.但是后来我尝试对其进行修改,以使用数据集并学习如何使用它.现在,我成功使程序运行,但是输出比以前差,主要是常数0.

I want to write a neural network which look for a x^2 distribution without a predefined model. Precisely, it is given some points in [-1,1] with their squares to train, and then it would have to reproduce and predict similar for e.g. [-10,10]. I've more or less done it - without datasets. But then I tried to modify it in order to use datasets and learn how to use it. Now, I succeded in making the program run, but the output is worse then before, mainly it's constant 0.

先前版本类似于[-1,1]中的x ^ 2,具有线性延伸,更好..

Previous version was like x^2 in [-1,1] with linear prolongation, which was better.. Previous output with a blue line being flat now. And the goal would be to coincide with a red one..

在这里,评论用波兰语表示,对此表示歉意.

Here, comments are in Polish, sorry for that.

# square2.py - drugie podejscie do trenowania sieci za pomocą Tensorflow
# cel: nauczyć sieć rozpoznawać rozkład x**2
# analiza skryptu z:
# https://stackoverflow.com/questions/43140591/neural-network-to-predict-nth-square

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.python.framework.ops import reset_default_graph

# def. danych do trenowania sieci
# x_train = (np.random.rand(10**3)*4-2).reshape(-1,1)
# y_train = x_train**2
square2_dane = np.load("square2_dane.npz")
x_train = square2_dane['x_tren'].reshape(-1,1)
y_train = square2_dane['y_tren'].reshape(-1,1) 

# zoptymalizować dzielenie danych
# x_train = square2_dane['x_tren'].reshape(-1,1)
# ds_x = tf.data.Dataset.from_tensor_slices(x_train)
# batch_x = ds_x.batch(rozm_paczki)
# iterator = ds_x.make_one_shot_iterator()

# określenie parametrów sieci
wymiary = [50,50,50,1]
epoki = 500
rozm_paczki = 200

reset_default_graph()
X = tf.placeholder(tf.float32, shape=[None,1])
Y = tf.placeholder(tf.float32, shape=[None,1])

weights = []
biases = []
n_inputs = 1

# inicjalizacja zmiennych
for i,n_outputs in enumerate(wymiary):
    with tf.variable_scope("layer_{}".format(i)):
        w = tf.get_variable(name="W", shape=[n_inputs,n_outputs],initializer = tf.random_normal_initializer(mean=0.0,stddev=0.02,seed=42))
        b=tf.get_variable(name="b",shape=[n_outputs],initializer=tf.zeros_initializer)
        weights.append(w)
        biases.append(b)
        n_inputs=n_outputs

def forward_pass(X,weights,biases):
    h=X
    for i in range(len(weights)):
        h=tf.add(tf.matmul(h,weights[i]),biases[i])
        h=tf.nn.relu(h)
    return h    

output_layer = forward_pass(X,weights,biases)
f_strat = tf.reduce_mean(tf.squared_difference(output_layer,Y),1)
f_strat = tf.reduce_sum(f_strat)
# alternatywna funkcja straty
#f_strat2 = tf.reduce_sum(tf.abs(Y-y_train)/y_train)
optimizer = tf.train.AdamOptimizer(learning_rate=0.003).minimize(f_strat)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    # trenowanie
    dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train))
    dataset = dataset.batch(rozm_paczki)
    dataset = dataset.repeat(epoki)
    iterator = dataset.make_one_shot_iterator()
    ds_x, ds_y = iterator.get_next()
    sess.run(optimizer, {X: sess.run(ds_x), Y: sess.run(ds_y)})
    saver = tf.train.Saver()
    save = saver.save(sess, "./model.ckpt")
    print("Model zapisano jako: %s" % save)

    # puszczenie sieci na danych
    x_test = np.linspace(-1,1,600)
    network_outputs = sess.run(output_layer,feed_dict = {X :x_test.reshape(-1,1)})

plt.plot(x_test,x_test**2,color='r',label='y=x^2')
plt.plot(x_test,network_outputs,color='b',label='sieć NN')
plt.legend(loc='right')
plt.show()

我认为问题出在训练数据的输入上 sess.run(optimizer, {X: sess.run(ds_x), Y: sess.run(ds_y)}) 或定义为ds_x,ds_y.这是我的第一个这样的程序. 这就是这些行的输出("sees"块的入库)

I think that the problem is with input of training data sess.run(optimizer, {X: sess.run(ds_x), Y: sess.run(ds_y)}) or with the definition of ds_x, ds_y. It's my first such a program.. So this was the output for the lines (insead of the 'sees' block)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    # trenowanie
    for i in range(epoki):
        idx = np.arange(len(x_train))
        np.random.shuffle(idx)
        for j in range(len(x_train)//rozm_paczki):
            cur_idx = idx[rozm_paczki*j:(rozm_paczki+1)*j]
            sess.run(optimizer,feed_dict = {X:x_train[cur_idx],Y:y_train[cur_idx]})
    saver = tf.train.Saver()
    save = saver.save(sess, "./model.ckpt")
    print("Model zapisano jako: %s" % save)

谢谢!

PS:神经网络给了我很大的启发.预测第n个平方

推荐答案

有两个问题共同导致模型的准确性较差,并且都涉及以下内容:

There are two problems that conspire to give your model poor accuracy, and both involve this line:

sess.run(optimizer, {X: sess.run(ds_x), Y: sess.run(ds_y)})

  1. 由于该代码不在循环中,因此仅执行一个训练步骤.您的原始代码运行了len(x_train)//rozm_paczki步骤,应该可以取得更大的进步.

  1. Only one training step will execute because this code is not in a loop. Your original code ran len(x_train)//rozm_paczki steps, which ought to make more progress.

sess.run(ds_x)sess.run(ds_y)的两个调用分别运行,这意味着它们将包含来自不同批次的互不相关的值.每次对sess.run(ds_x)sess.run(ds_y)的调用都会将Iterator移至下一批,并丢弃您未在sess.run()调用中明确要求的输入元素的任何部分.本质上,您将从批次 i 中获得X,并从批次 i + 1 中获得Y(反之亦然),并且该模型将在无效数据上进行训练.如果要从同一批次中获取值,则需要在单个sess.run([ds_x, ds_y])调用中完成.

The two calls to sess.run(ds_x) and sess.run(ds_y) run in separate steps, which means they will contain values from different batches that are unrelated. Each call to sess.run(ds_x) or sess.run(ds_y) moves the Iterator on to the next batch, and discards any parts of the input element that you did not explicitly request in the sess.run() call. Essentially, you will get X from batch i and Y from batch i+1 (or vice versa), and the model will train on invalid data. If you want to get values from the same batch, you need to do it in a single sess.run([ds_x, ds_y]) call.

还有两个可能影响效率的问题:

There are two further concerns that might impact efficiency:

  1. Dataset没有被混洗.您的原始代码在每个时期的开始调用np.random.shuffle().您应该在dataset = dataset.repeat()之前加入dataset = dataset.shuffle(len(x_train)).

  1. The Dataset is not shuffled. Your original code calls np.random.shuffle() at the beginning of each epoch. You should include a dataset = dataset.shuffle(len(x_train)) before dataset = dataset.repeat().

Iterator中的值取回Python(例如,当您执行sess.run(ds_x)时)并将它们反馈回训练步骤效率不高.将Iterator.get_next()操作的输出直接作为输入传递到前馈步骤中会更有效率.

It is inefficient to fetch the values from the the Iterator back to Python (e.g. when you do sess.run(ds_x)) and feed them back into the training step. It is more efficient to pass the output of the Iterator.get_next() operation directly into the feed-forward step as inputs.

将所有内容放在一起,这是程序的重写版本,解决了这四个问题,并获得了正确的结果. (不幸的是,我的波兰语不足以保留评论,因此我已将其翻译成英文.)

Putting this all together, here's a rewritten version of your program that addresses these four points, and achieves the correct results. (Unfortunately my Polish isn't good enough to preserve the comments, so I've translated to English.)

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np

# Generate training data.
x_train = np.random.rand(10**3, 1).astype(np.float32) * 4 - 2
y_train = x_train ** 2

# Define hyperparameters.
DIMENSIONS = [50,50,50,1]
NUM_EPOCHS = 500
BATCH_SIZE = 200

dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train))
dataset = dataset.shuffle(len(x_train))  # (Point 3.) Shuffle each epoch.
dataset = dataset.repeat(NUM_EPOCHS)
dataset = dataset.batch(BATCH_SIZE)
iterator = dataset.make_one_shot_iterator()

# (Point 2.) Ensure that `X` and `Y` correspond to the same batch of data.
# (Point 4.) Pass the tensors returned from `iterator.get_next()`
# directly as the input of the network.
X, Y = iterator.get_next()

# Initialize variables.
weights = []
biases = []
n_inputs = 1
for i, n_outputs in enumerate(DIMENSIONS):
  with tf.variable_scope("layer_{}".format(i)):
    w = tf.get_variable(name="W", shape=[n_inputs, n_outputs],
                        initializer=tf.random_normal_initializer(
                            mean=0.0, stddev=0.02, seed=42))
    b = tf.get_variable(name="b", shape=[n_outputs],
                        initializer=tf.zeros_initializer)
    weights.append(w)
    biases.append(b)
    n_inputs = n_outputs

def forward_pass(X,weights,biases):
  h = X
  for i in range(len(weights)):
    h=tf.add(tf.matmul(h, weights[i]), biases[i])
    h=tf.nn.relu(h)
  return h

output_layer = forward_pass(X, weights, biases)
loss = tf.reduce_sum(tf.reduce_mean(
    tf.squared_difference(output_layer, Y), 1))
optimizer = tf.train.AdamOptimizer(learning_rate=0.003).minimize(loss)
saver = tf.train.Saver()

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())

  # (Point 1.) Run the `optimizer` in a loop. Use try-while-except to iterate
  # until all elements in `dataset` have been consumed.
  try:
    while True:
      sess.run(optimizer)
  except tf.errors.OutOfRangeError:
    pass

  save = saver.save(sess, "./model.ckpt")
  print("Model saved to path: %s" % save)

  # Evaluate network.
  x_test = np.linspace(-1, 1, 600)
  network_outputs = sess.run(output_layer, feed_dict={X: x_test.reshape(-1, 1)})

plt.plot(x_test,x_test**2,color='r',label='y=x^2')
plt.plot(x_test,network_outputs,color='b',label='NN prediction')
plt.legend(loc='right')
plt.show()

这篇关于使用带有数据集类的TensorFlow提升到正方形的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆