Google Cloud ML 和 GCS Bucket 问题 [英] Google Cloud ML and GCS Bucket issues

查看:34
本文介绍了Google Cloud ML 和 GCS Bucket 问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用研究论文的开源 Tensorflow 实现,例如 DCGAN-tensorflow.我使用的大多数库都配置为在本地训练模型,但我想使用 Google Cloud ML 来训练模型,因为我的笔记本电脑上没有 GPU.我发现很难更改代码以支持 GCS 存储桶.目前,我将我的日志和模型保存到/tmp,然后在训练结束时运行gsutil"命令将目录复制到 gs://my-bucket(示例).如果我尝试将模型直接保存到 gs://my-bucket,它永远不会显示.

I'm using open source Tensorflow implementations of research papers, for example DCGAN-tensorflow. Most of the libraries I'm using are configured to train the model locally, but I want to use Google Cloud ML to train the model since I don't have a GPU on my laptop. I'm finding it difficult to change the code to support GCS buckets. At the moment, I'm saving my logs and models to /tmp and then running a 'gsutil' command to copy the directory to gs://my-bucket at the end of training (example here). If I try saving the model directly to gs://my-bucket it never shows up.

至于训练数据,tensorflow 样本之一将数据从 GCS 复制到/tmp 进行训练(此处的示例),但这仅在数据集较小时有效.我想用celebA,每次运行都复制到/tmp太大了.是否有关于如何更新在本地训练以使用 Google Cloud ML 的代码的文档或指南?

As for training data, one of the tensorflow samples copies data from GCS to /tmp for training (example here), but this only works when the dataset is small. I want to use celebA, and it is too large to copy to /tmp every run. Is there any documentation or guides for how to go about updating code that trains locally to use Google Cloud ML?

这些实现正在运行各种版本的 Tensorflow,主要是 .11 和 .12

The implementations are running various versions of Tensorflow, mainly .11 and .12

推荐答案

目前没有明确的指南.基本思想是用 file_io 模块,最值得注意的是:

There is currently no definitive guide. The basic idea would be to replace all occurrences of native Python file operations with equivalents in the file_io module, most notably:

  • open() -> file_io.FileIO()
  • os.path.exists() -> file_io.file_exists()
  • glob.glob() -> file_io.get_matching_files()

这些函数将在本地和 GCS(以及任何注册的文件系统)上工作.但是请注意,file_io 和标准文件操作之间存在一些细微差别(例如,支持一组不同的模式").

These functions will work locally and on GCS (as well as any registered file system). Note, however, that there are some slight differences in file_io and the standard file operations (e.g., a different set of 'modes' are supported).

幸运的是,检查点和摘要编写开箱即用,只需确保将 GCS 路径传递给 tf.train.Saver.savetf.summary.FileWriter.

Fortunately, checkpoint and summary writing do work out of the box, just be sure to pass a GCS path to tf.train.Saver.save and tf.summary.FileWriter.

在您发送的样本中,这看起来可能很痛苦.当程序开始只需要执行一次时,考虑猴子修补 Python 函数以映射到 TensorFlow 等效项(演示 此处).

In the sample you sent, that looks potentially painful. Consider monkey patching the Python functions to map to the TensorFlow equivalents when the program starts to only have to do it once (demonstrated here).

作为旁注,页面上的所有示例显示从 GCS 读取文件.

As a side note, all of the samples on this page show reading files from GCS.

这篇关于Google Cloud ML 和 GCS Bucket 问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆