我如何直接将数据流式传输到tensorflow中,而不是读取光盘上的文件? [英] How can I stream data directly into tensorflow as opposed to reading files on disc?
问题描述
每个tensorflow教程我都能找到到目前为止的作品,方法是先将训练/验证/测试图像加载到内存中,然后对其进行处理.有没有人有将图像和标签作为张量流输入的指南或建议?我有很多图像存储在不同的服务器上,我想将这些图像流式传输到张量流,而不是直接将图像保存到我的机器上.
Every tensorflow tutorial I've been able to find so far works by first loading the training/validation/test images into memory and then processing them. Does anyone have a guide or recommendations for streaming images and labels as input into tensorflow? I have a lot of images stored on a different server and I would like to stream those images into tensorflow as opposed to saving the images directly on my machine.
谢谢!
推荐答案
Tensorflow does have Queues, which support streaming so you don't have to load the full data in memory. But yes, they only support reading from files on the same server by default. The real problem you have is that, you want to load in memory data from some other server. I can think of following ways to do this:
- 使用REST服务公开图像.在python中编写您自己的排队机制并读取此数据(使用Urllib等)并将其提供给Tensorflow 答案),尽管它复杂得多.优点是,与普通的python多线程队列相比,tensorflow队列可以使用多个内核,从而为您提供更好的性能.
- Expose your images using a REST service. Write your own queueing mechanism in python and read this data (using Urllib or something) and feed it to Tensorflow placeholders.
Instead of using python queues (as above) you can use Tensorflow queues as well (See this answer), although it's slighly more complicated. The advantage will be, tensorflow queues can use multiple cores giving you better performance, compared to normal python multi-threaded queues.
使用网络安装来欺骗您的操作系统,使您认为数据在同一台计算机上.
Use a network mount to fool your OS into believing the data is on the same machine.
此外,请记住,使用这种分布式设置时,总是会产生网络开销(将图像从服务器1传输到服务器2所花费的时间),这会大大降低您的培训速度.为了解决这个问题,您必须构建一个具有fetch-execute重叠的多线程排队机制,这需要很多工作. IMO的一个更简单的选择是将数据复制到您的训练机中.
Also, remember when using this sort of distributed setup, you will always incur network overhead (time taken for images to be transferred from Server 1 to 2), which can slow your training by a lot. To counteract this, you'll have to build a multi-threaded queueing mechanism with fetch-execute overlap, which is a lot of effort. An easier option IMO is to just copy the data into your training machine.
这篇关于我如何直接将数据流式传输到tensorflow中,而不是读取光盘上的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!