Tensorflow Python读取2个文件 [英] Tensorflow Python reading 2 files

查看:76
本文介绍了Tensorflow Python读取2个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试运行以下(缩短的)代码:

I have the following (shortened) code I am trying to run:

coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)

try:
   while not coord.should_stop():      

      # Run some code.... (Reading some data from file 1)

      coord_dev = tf.train.Coordinator()
      threads_dev = tf.train.start_queue_runners(sess=sess, coord=coord_dev)

      try:
        while not coord_dev.should_stop():

           # Run some other code.... (Reading data from file 2)

      except tf.errors.OutOfRangeError:
        print('Reached end of file 2')
      finally:
        coord_dev.request_stop()
        coord_dev.join(threads_dev) 

except tf.errors.OutOfRangeError:
   print('Reached end of file 1')
finally:
   coord.request_stop()
   coord.join(threads)

上面应该发生的是:

  • 文件1是一个csv文件,其中包含我的神经网络的训练数据.
  • 文件2包含开发人员设置数据.

在训练过程中遍历文件1时,我有时也想计算开发集数据(来自文件2)的准确性. 但是当内部循环完成读取文件2时,显然会触发异常

While iterating over File 1 during training, I occasionally want to calculate cost an accuracy on dev set data (from File 2) as well. But when the inner loop finishes reading File 2, it obviously triggers the exception

"tf.errors.OutOfRangeError"

"tf.errors.OutOfRangeError"

这也会导致我的代码也离开外部循环.内部循环的异常也被简单地视为外部循环的异常.但是在阅读完文件2之后,我希望我的代码在外循环中继续对文件1进行训练.

which causes my code to leave the outer loop as well. The exception of inner loop simply handled as the exception of outer loop too. But after finishing reading the File 2, I want my code continue training over File 1 in the outer loop.

(为了简化代码的可读性,我删除了一些细节,例如num_epochs进行训练等)

(I have removed some details like num_epochs to train etc to simplify the readibility of the code)

有人对如何解决此问题有任何建议吗?我对此有些陌生.

Does any one have any suggestion regarding how to solve this problem? I am a bit new in this.

提前谢谢!

推荐答案

已解决.

显然,使用queue_runners不是正确的方法. Tensorflow文档指示应改为使用数据集api,这需要花一些时间来理解.下面的代码执行了我之前试图做的事情.在这里共享,以防其他人也需要它.

Apparently, using queue_runners is not the right way of doing this. Tensorflow documentation indicates that dataset api should be used instead, which took its time to understand. The below code does what I was trying to do previously. Sharing here in case other people may need it as well.

我在www.github.com/loheden/tf_examples/dataset api下放置了一些其他培训代码.我努力寻找完整的示例.

I have put some additional training code under www.github.com/loheden/tf_examples/dataset api. I struggled a bit to find complete examples.

# READING DATA FROM train and validation (dev set) CSV FILES by using INITIALIZABLE ITERATORS

# All csv files have same # columns. First column is assumed to be train example ID, the next 5 columns are feature
# columns, and the last column is the label column

# ASSUMPTIONS: (Otherwise, decode_csv function needs update)
# 1) The first column is NOT a feature. (It is most probably a training example ID or similar)
# 2) The last column is always the label. And there is ONLY 1 column that represents the label.
#    If more than 1 column represents the label, see the next example down below

feature_names = ['f1','f2','f3','f4','f5']
record_defaults = [[""], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]]


def decode_csv(line):
   parsed_line = tf.decode_csv(line, record_defaults)
   label =  parsed_line[-1]      # label is the last element of the list
   del parsed_line[-1]           # delete the last element from the list
   del parsed_line[0]            # even delete the first element bcz it is assumed NOT to be a feature
   features = tf.stack(parsed_line)  # Stack features so that you can later vectorize forward prop., etc.
   #label = tf.stack(label)          #NOT needed. Only if more than 1 column makes the label...
   batch_to_return = features, label
   return batch_to_return

filenames = tf.placeholder(tf.string, shape=[None])
dataset5 = tf.data.Dataset.from_tensor_slices(filenames)
dataset5 = dataset5.flat_map(lambda filename: tf.data.TextLineDataset(filename).skip(1).map(decode_csv))
dataset5 = dataset5.shuffle(buffer_size=1000)
dataset5 = dataset5.batch(7)
iterator5 = dataset5.make_initializable_iterator()
next_element5 = iterator5.get_next()

# Initialize `iterator` with training data.
training_filenames = ["train_data1.csv", 
                      "train_data2.csv"]

# Initialize `iterator` with validation data.
validation_filenames = ["dev_data1.csv"]

with tf.Session() as sess:
    # Train 2 epochs. Then validate train set. Then validate dev set.
    for _ in range(2):     
        sess.run(iterator5.initializer, feed_dict={filenames: training_filenames})
        while True:
            try:
              features, labels = sess.run(next_element5)
              # Train...
              print("(train) features: ")
              print(features)
              print("(train) labels: ")
              print(labels)  
            except tf.errors.OutOfRangeError:
              print("Out of range error triggered (looped through training set 1 time)")
              break

    # Validate (cost, accuracy) on train set
    print("\nDone with the first iterator\n")

    sess.run(iterator5.initializer, feed_dict={filenames: validation_filenames})
    while True:
        try:
          features, labels = sess.run(next_element5)
          # Validate (cost, accuracy) on dev set
          print("(dev) features: ")
          print(features)
          print("(dev) labels: ")
          print(labels)
        except tf.errors.OutOfRangeError:
          print("Out of range error triggered (looped through dev set 1 time only)")
          break  

这篇关于Tensorflow Python读取2个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆