如何从示例队列将数据读取到TensorFlow批处理中? [英] How to read data into TensorFlow batches from example queue?

查看:54
本文介绍了如何从示例队列将数据读取到TensorFlow批处理中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将TensorFlow示例队列分成适当的批次进行训练?

How do I get TensorFlow example queues into proper batches for training?

我有一些图片和标签:

IMG_6642.JPG 1
IMG_6643.JPG 2

(随意建议另一种标签格式;我想可能需要另一种密集的来稀疏步骤...)

(feel free to suggest another label format; I think I may need another dense to sparse step...)

我已经阅读了很多教程,但还没有完全结合在一起. 这就是我所拥有的内容,其中的注释表示TensorFlow的读取数据页面.

I've read through quite a few tutorials but don't quite have it all together yet. Here's what I have, with comments indicating the steps required from TensorFlow's Reading Data page.

  1. 文件名列表 (为简单起见,删除了可选步骤)
  2. 文件名队列
  3. 用于文件格式的阅读器
  4. 读取器读取记录的解码器
  5. 示例队列
  1. The list of filenames (optional steps removed for the sake of simplicity)
  2. Filename queue
  3. A Reader for the file format
  4. A decoder for a record read by the reader
  5. Example queue

在示例队列之后,我需要将该队列分成批进行训练;那就是我被困住的地方...

And after the example queue I need to get this queue into batches for training; that's where I'm stuck...

1.文件名列表

files = tf.train.match_filenames_once('*.JPG')

4.文件名队列

filename_queue = tf.train.string_input_producer(files, num_epochs=None, shuffle=True, seed=None, shared_name=None, name=None)

5.读者

reader = tf.TextLineReader() key, value = reader.read(filename_queue)

reader = tf.TextLineReader() key, value = reader.read(filename_queue)

6.解码器

record_defaults = [[""], [1]] col1, col2 = tf.decode_csv(value, record_defaults=record_defaults) (我认为我不需要下面的步骤,因为我已经在张量中添加了标签,但无论如何我都将其包括在内)

record_defaults = [[""], [1]] col1, col2 = tf.decode_csv(value, record_defaults=record_defaults) (I don't think I need this step below because I already have my label in a tensor but I include it anyways)

features = tf.pack([col2])

文档页面上有一个示例,用于运行一个图像,而不是批量获取图像和标签:

The documentation page has an example to run one image, not get the images and labels into batches:

for i in range(1200): # Retrieve a single instance: example, label = sess.run([features, col5])

for i in range(1200): # Retrieve a single instance: example, label = sess.run([features, col5])

然后在其下面有一个批处理部分:

And then below it has a batching section:

def read_my_file_format(filename_queue):
  reader = tf.SomeReader()
  key, record_string = reader.read(filename_queue)
  example, label = tf.some_decoder(record_string)
  processed_example = some_processing(example)
  return processed_example, label

def input_pipeline(filenames, batch_size, num_epochs=None):
  filename_queue = tf.train.string_input_producer(
  filenames, num_epochs=num_epochs, shuffle=True)
  example, label = read_my_file_format(filename_queue)
  # min_after_dequeue defines how big a buffer we will randomly sample
  #   from -- bigger means better shuffling but slower start up and more
  #   memory used.
  # capacity must be larger than min_after_dequeue and the amount larger
  #   determines the maximum we will prefetch.  Recommendation:
  #   min_after_dequeue + (num_threads + a small safety margin) *              batch_size
  min_after_dequeue = 10000
  capacity = min_after_dequeue + 3 * batch_size
  example_batch, label_batch = tf.train.shuffle_batch(
  [example, label], batch_size=batch_size, capacity=capacity,
  min_after_dequeue=min_after_dequeue)
  return example_batch, label_batch

我的问题是:如何将上面的示例代码与上面的代码一起使用?我需要 batches 来使用,并且大多数教程都附带了mnist批次.

My question is: how do I use the above example code with the code I have above? I need batches to work with, and most of the tutorials come with mnist batches already.

with tf.Session() as sess:
  sess.run(init)

  # Training cycle
for epoch in range(training_epochs):
    total_batch = int(mnist.train.num_examples/batch_size)
    # Loop over all batches
    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)

推荐答案

如果希望使此输入管道正常工作,则需要添加一个异步队列机制来生成大量示例.这是通过创建tf.RandomShuffleQueuetf.FIFOQueue并插入已读取,解码和预处理的JPEG图像来完成的.

If you wish to make this input pipeline work, you will need add an asynchronous queue'ing mechanism that generate batches of examples. This is performed by creating a tf.RandomShuffleQueue or a tf.FIFOQueue and inserting JPEG images that have been read, decoded and preprocessed.

您可以使用方便的结构来生成队列和相应的线程,以通过tf.train.shuffle_batch_jointf.train.batch_join运行队列.这是一个简单的示例.请注意,此代码未经测试:

You can use handy constructs that will generate the Queues and the corresponding threads for running the queues via tf.train.shuffle_batch_join or tf.train.batch_join. Here is a simplified example of what this would like. Note that this code is untested:

# Let's assume there is a Queue that maintains a list of all filenames
# called 'filename_queue'
_, file_buffer = reader.read(filename_queue)

# Decode the JPEG images
images = []
image = decode_jpeg(file_buffer)

# Generate batches of images of this size.
batch_size = 32

# Depends on the number of files and the training speed.
min_queue_examples = batch_size * 100
images_batch = tf.train.shuffle_batch_join(
  image,
  batch_size=batch_size,
  capacity=min_queue_examples + 3 * batch_size,
  min_after_dequeue=min_queue_examples)

# Run your network on this batch of images.
predictions = my_inference(images_batch)

根据您需要扩大工作量的方式,您可能需要运行多个独立的线程来读取/解码/预处理图像并将其转储到示例队列中. Inception/ImageNet模型中提供了此类管道的完整示例.看看batch_inputs:

Depending on how you need to scale up your job, you might need to run multiple independent threads that read/decode/preprocess images and dump them in your example queue. A complete example of such a pipeline is provided in the Inception/ImageNet model. Take a look at batch_inputs:

https://github.com/tensorflow/型号/blob/master/inception/inception/image_processing.py#L407

最后,如果要处理> O(1000)个JPEG图像,请记住,单独准备1000个小文件的效率极低.这会大大减慢您的训练速度.

Finally, if you are working with >O(1000) JPEG images, keep in mind that it is extremely inefficient to individually ready 1000's of small files. This will slow down your training quite a bit.

将图像数据集转换为Example原型的分片TFRecord的更强大,更快速的解决方案.这是一个完全有效的脚本,用于转换ImageNet数据设置为这种格式.这是一组说明,用于在包含JPEG图像的任意目录上运行此预处理脚本的通用版本.

A more robust and faster solution to convert a dataset of images to a sharded TFRecord of Example protos. Here is a fully worked script for converting the ImageNet data set to such a format. And here is a set of instructions for running a generic version of this preprocessing script on an arbitrary directory containing JPEG images.

这篇关于如何从示例队列将数据读取到TensorFlow批处理中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆