Cloud ML 的 Google Storage (gs) 包装文件输入/输出? [英] Google Storage (gs) wrapper file input/out for Cloud ML?

查看:34
本文介绍了Cloud ML 的 Google Storage (gs) 包装文件输入/输出?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Google recently announced the Clould ML, https://cloud.google.com/ml/ and it's very useful. However, one limitation is that the input/out of a Tensorflow program should support gs://.

If we use all tensorflow APIS to read/write files, it should OK, since these APIs support gs://.

However, if we use native file IO APIs such as open, it does not work, because they don't understand gs://

For example:

 with open(vocab_file, 'wb') as f:
        cPickle.dump(self.words, f)

This code won't work in Google Cloud ML.

However, modifying all native file IO APIs to tensorflow APIs or Google Storage Python APIs is really tedious. Is there any simple way to do this? Any wrappers to support google storage systems, gs:// on top of the native file IO?

As suggested here Pickled scipy sparse matrix as input data?, perhaps we can use file_io.read_file_to_string('gs://...'), but still this requrements significant code modifcation.

解决方案

One solution is to copy all of the data to local disk when the program starts up. You can do that using gsutil inside the Python script that gets run, something like:

vocab_file = 'vocab.pickled'
subprocess.check_call(['gsutil', '-m' , 'cp', '-r',
                       os.path.join('gs://path/to/', vocab_file), '/tmp'])

with open(os.path.join('/tmp', vocab_file), 'wb') as f:
  cPickle.dump(self.words, f)

And if you have any outputs, you can write them to local disk and gsutil rsync them. (But, be careful to handle restarts correctly, because you may be put on a different machine).

The other solution is to monkey patch open (Note: untested):

import __builtin__

# NB: not all modes are compatible; should handle more carefully.
# Probably should be reported on
# https://github.com/tensorflow/tensorflow/issues/4357
def new_open(name, mode='r', buffering=-1):
  return file_io.FileIO(name, mode)

__builtin__.open = new_open

Just be sure to do that before any module actually tries to read from GCS.

这篇关于Cloud ML 的 Google Storage (gs) 包装文件输入/输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆