Google Cloud ML和GCS Bucket问题 [英] Google Cloud ML and GCS Bucket issues

查看:625
本文介绍了Google Cloud ML和GCS Bucket问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用开源的Tensorflow实现研究论文,例如 DCGAN-tensorflow 。我使用的大多数库都配置为在本地训练模型,但我想使用Google Cloud ML来训练模型,因为我的笔记本电脑上没有GPU。我发现很难更改代码来支持GCS存储桶。目前,我将日志和模型保存到/ tmp,然后在训练结束时运行'gsutil'命令将目录复制到gs:// my-bucket(示例这里)。如果我尝试将模型直接保存到gs:// my-bucket,它永远不会显示出来。



对于训练数据,其中一个tensorflow样本将数据从GCS复制到/ tmp进行培训(此处示例),但这只适用于数据集很小的情况。我想使用celebA,并且它太大而无法复制到/ tmp每次运行。是否有任何文档或指南可以帮助您更新本地培训以使用Google Cloud ML的代码?



这些实现运行着各种版本的Tensorflow,主要是.11和.12

解决方案

目前没有权威指南。基本思路是用 file_io 模块,最值得注意的是:



这些函数将在本地工作d在GCS上(以及任何注册的文件系统)。但请注意, file_io 和标准文件操作(例如,支持一组不同的'模式')是有区别的。



幸运的是,检查点和汇总编写确实可以开箱即用,只要确保将GCS路径传递到 tf.train.Saver.save tf.summary.FileWriter



在您发送的样本中,这看起来可能很痛苦。考虑猴子修补Python函数以在程序开始只需执行一次时映射到TensorFlow等价物(演示此处 )。



请注意,这个页面显示从GCS读取文件。


I'm using open source Tensorflow implementations of research papers, for example DCGAN-tensorflow. Most of the libraries I'm using are configured to train the model locally, but I want to use Google Cloud ML to train the model since I don't have a GPU on my laptop. I'm finding it difficult to change the code to support GCS buckets. At the moment, I'm saving my logs and models to /tmp and then running a 'gsutil' command to copy the directory to gs://my-bucket at the end of training (example here). If I try saving the model directly to gs://my-bucket it never shows up.

As for training data, one of the tensorflow samples copies data from GCS to /tmp for training (example here), but this only works when the dataset is small. I want to use celebA, and it is too large to copy to /tmp every run. Is there any documentation or guides for how to go about updating code that trains locally to use Google Cloud ML?

The implementations are running various versions of Tensorflow, mainly .11 and .12

解决方案

There is currently no definitive guide. The basic idea would be to replace all occurrences of native Python file operations with equivalents in the file_io module, most notably:

These functions will work locally and on GCS (as well as any registered file system). Note, however, that there are some slight differences in file_io and the standard file operations (e.g., a different set of 'modes' are supported).

Fortunately, checkpoint and summary writing do work out of the box, just be sure to pass a GCS path to tf.train.Saver.save and tf.summary.FileWriter.

In the sample you sent, that looks potentially painful. Consider monkey patching the Python functions to map to the TensorFlow equivalents when the program starts to only have to do it once (demonstrated here).

As a side note, all of the samples on this page show reading files from GCS.

这篇关于Google Cloud ML和GCS Bucket问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆