使用数据集 API 在 Tensorflow 中批量滑动窗口 [英] Sliding window of a batch in Tensorflow using Dataset API

查看:57
本文介绍了使用数据集 API 在 Tensorflow 中批量滑动窗口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法在批处理中修改我的图像的组成?目前,当我创建例如一个大小为 4 的批次,我的批次将如下所示:

Is there a way to modify the composition of my images within a batch? At the moment, when I'm creating e.g. a batch with the size of 4, my batches will look like that:

Batch1:[Img0 Img1 Img2 Img3]Batch2:[Img4 Img5 Img6 Img7]

Batch1: [Img0 Img1 Img2 Img3] Batch2: [Img4 Img5 Img6 Img7]

我需要修改我的批次的组成,以便它只移动一次到下一个图像.那么它应该是这样的:

I need to modify the composition of my batches so that it will only shift once to the next image. Then it should look like that:

Batch1:[Img0 Img1 Img2 Img3]Batch2:[Img1 Img2 Img3 Img4]Batch3:[Img2 Img3 Img4 Img5]Batch4:[Img3 Img4 Img5 Img6]Batch5:[Img4 Img5 Img6 Img7]

Batch1: [Img0 Img1 Img2 Img3] Batch2: [Img1 Img2 Img3 Img4] Batch3: [Img2 Img3 Img4 Img5] Batch4: [Img3 Img4 Img5 Img6] Batch5: [Img4 Img5 Img6 Img7]

我在我的代码中使用了 Tensorflow 的数据集 API,如下所示:

I'm using in my code the Dataset API of Tensorflow which looks as follows:

def tfrecords_train_input(input_dir, examples, epochs, nsensors, past, future,
                          features, batch_size, threads, shuffle, record_type):
    filenames = sorted(
        [os.path.join(input_dir, f) for f in os.listdir(input_dir)])
      num_records = 0
      for fn in filenames:
        for _ in tf.python_io.tf_record_iterator(fn):
          num_records += 1
      print("Number of files to use:", len(filenames), "/ Total records to use:", num_records)
      dataset = tf.data.TFRecordDataset(filenames)
      # Parse records
      read_proto = partial(record_type().read_proto, nsensors=nsensors, past=past,
                           future=future, features=features)
      # Parallelize Data Transformation on available GPU
      dataset = dataset.map(map_func=read_proto, num_parallel_calls=threads)
      # Cache data
      dataset = dataset.cache()
      # repeat after shuffling
      dataset = dataset.repeat(epochs)
      # Batch data
      dataset = dataset.batch(batch_size)
      # Efficient Pipelining
      dataset = dataset.prefetch(2)
      iterator = dataset.make_one_shot_iterator()
      return iterator

推荐答案

可以使用滑动窗口实现对tf.data.Dataset的批量操作:

Can be achieved using sliding window batch operation for tf.data.Dataset:

示例:

from tensorflow.contrib.data.python.ops import sliding

imgs = tf.constant(['img0','img1', 'img2','img3', 'img4','img5', 'img6', 'img7'])
labels = tf.constant([0, 0, 0, 1, 1, 1, 0, 0])

# create TensorFlow Dataset object
data = tf.data.Dataset.from_tensor_slices((imgs, labels))

# sliding window batch
window = 4
stride = 1
data = data.apply(sliding.sliding_window_batch(window, stride))

# create TensorFlow Iterator object
iterator =  tf.data.Iterator.from_structure(data.output_types,data.output_shapes)
next_element = iterator.get_next()

# create initialization ops 
init_op = iterator.make_initializer(data)

with tf.Session() as sess:
   # initialize the iterator on the data
   sess.run(init_op)
   while True:
      try:
         elem = sess.run(next_element)
         print(elem)
      except tf.errors.OutOfRangeError:
         print("End of dataset.")
         break

输出:

 (array([b'img0', b'img1', b'img2', b'img3'], dtype=object), array([0, 0, 0, 1], dtype=int32))
 (array([b'img1', b'img2', b'img3', b'img4'], dtype=object), array([0, 0, 1, 1], dtype=int32))
 (array([b'img2', b'img3', b'img4', b'img5'], dtype=object), array([0, 1, 1, 1], dtype=int32))
 (array([b'img3', b'img4', b'img5', b'img6'], dtype=object), array([1, 1, 1, 0], dtype=int32))
 (array([b'img4', b'img5', b'img6', b'img7'], dtype=object), array([1, 1, 0, 0], dtype=int32))

这篇关于使用数据集 API 在 Tensorflow 中批量滑动窗口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆