我如何直接将数据流式传输到tensorflow中,而不是读取光盘上的文件? [英] How can I stream data directly into tensorflow as opposed to reading files on disc?

查看:215
本文介绍了我如何直接将数据流式传输到tensorflow中,而不是读取光盘上的文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

每个tensorflow教程我都能找到到目前为止的作品,方法是先将训练/验证/测试图像加载到内存中,然后对其进行处理.有没有人有将图像和标签作为张量流输入的指南或建议?我有很多图像存储在不同的服务器上,我想将这些图像流式传输到张量流,而不是直接将图像保存到我的机器上.

Every tensorflow tutorial I've been able to find so far works by first loading the training/validation/test images into memory and then processing them. Does anyone have a guide or recommendations for streaming images and labels as input into tensorflow? I have a lot of images stored on a different server and I would like to stream those images into tensorflow as opposed to saving the images directly on my machine.

谢谢!

推荐答案

Tensorflow确实具有

Tensorflow does have Queues, which support streaming so you don't have to load the full data in memory. But yes, they only support reading from files on the same server by default. The real problem you have is that, you want to load in memory data from some other server. I can think of following ways to do this:

  • Expose your images using a REST service. Write your own queueing mechanism in python and read this data (using Urllib or something) and feed it to Tensorflow placeholders.
  • Instead of using python queues (as above) you can use Tensorflow queues as well (See this answer), although it's slighly more complicated. The advantage will be, tensorflow queues can use multiple cores giving you better performance, compared to normal python multi-threaded queues.

使用网络安装来欺骗您的操作系统,使您认为数据在同一台计算机上.

Use a network mount to fool your OS into believing the data is on the same machine.

此外,请记住,使用这种分布式设置时,总是会产生网络开销(将图像从服务器1传输到服务器2所花费的时间),这会大大降低您的培训速度.为了解决这个问题,您必须构建一个具有fetch-execute重叠的多线程排队机制,这需要很多工作. IMO的一个更简单的选择是将数据复制到您的训练机中.

Also, remember when using this sort of distributed setup, you will always incur network overhead (time taken for images to be transferred from Server 1 to 2), which can slow your training by a lot. To counteract this, you'll have to build a multi-threaded queueing mechanism with fetch-execute overlap, which is a lot of effort. An easier option IMO is to just copy the data into your training machine.

这篇关于我如何直接将数据流式传输到tensorflow中,而不是读取光盘上的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆