如何在张量流中并行加载数据? [英] How to load data parallelly in tensorflow?

查看:28
本文介绍了如何在张量流中并行加载数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先介绍一下我的申请背景:

我的磁盘中有大约 500,000 个视频保存为 avi 文件,我会将它们用作训练样本.要使用它们,我们可以将它们同时加载到内存中,然后将每个批次输入到模型中进行训练,这是最简单的方法.但是,我的内存不够 足以应付整个加载过程.因此我需要批量加载视频数据.但是你知道,解码一批(这里取 64 个)视频可能会花费很多时间,如果你连续这样做,我们将在数据加载部分浪费大量时间而不是计算.因此,我想并行批量加载数据,实际上就像 keras 中的 API fit_generator 一样.我想知道在 TensorFlow 中是否有一种现有的方法可以做到这一点.

There are about 500,000 videos saved as avi files in my disk and i will use them as training samples. To use them we can load them simultaneously into the memory and then feed each batch into the model for trianing, which is the easiest way. However my memory is NOT big enough for the whole loading. Therefore i need to load the video data batchly. But you know, decode a batch(take 64 here) of video might cost a lot of time and if you do that serially, we will waste a lot of time in the data loading part instead of computing. Thus i want to batchly load the data parallelly, in fact, just like the API fit_generator in keras. I wonder if there is a existing way to do that in TensorFlow.

感谢您的任何建议:)

PS:我曾经通过 Python 中的 theading 包来实现这个想法,有关更多信息,请访问这里 https://github.com/FesianXu/Parallel-DataLoader-in-TensorFlow

PS: i used to implement the idea by the theading package in Python, for more, visit here https://github.com/FesianXu/Parallel-DataLoader-in-TensorFlow

当然,这只是一个玩具代码,而且太临时了.我想要一个更通用的解决方案,就像 Keras 中的 fit_generator 一样.

of course it is just a toy code and too ad hoc. I wanna a more general solution just like fit_generator in Keras.

推荐答案

看看 tf.data.Dataset.from_generator:

创建一个数据集,其元素由生成器生成.

Creates a Dataset whose elements are generated by generator.

生成器参数必须是一个可调用的对象,它返回一个支持 iter() 协议的对象(例如生成器函数).生成器生成的元素必须与给定的兼容output_types 和(可选)output_shapes 参数.

The generator argument must be a callable object that returns an object that support the iter() protocol (e.g. a generator function). The elements generated by generator must be compatible with the given output_types and (optional) output_shapes arguments.

这个例子展示了如何使用轻松并行化生成器tf.data.Dataset.map 带有 num_parallel_calls 参数:https://github.com/tensorflow/tensorflow/issues/14448#issuecomment-349240274

This example shows how to easily parallelize the generator using tf.data.Dataset.map with the num_parallel_calls parameter: https://github.com/tensorflow/tensorflow/issues/14448#issuecomment-349240274

更多信息:https://www.tensorflow.org/guide/data_performance#parallelizing_data_extraction

这篇关于如何在张量流中并行加载数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆