Google Cloud ML Engine培训师是否必须明确知道Google Cloud Storage URI? [英] Does Google Cloud ML Engine trainer have to be explicitly aware of Google Cloud Storage URIs?

查看:80
本文介绍了Google Cloud ML Engine培训师是否必须明确知道Google Cloud Storage URI?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将现有的TensorFlow模型与Google Cloud ML Engine一起使用,到目前为止,我已经在本地运行该模型.

I am trying to use an existing TensorFlow model, which I have so far run locally, with Google Cloud ML Engine.

该模型当前通过将文件名(例如my_model.trainmy_model.eval)传递到 tf.data.TextLineDataset .这些文件名现在已在模型的训练器中进行了硬编码,但我计划对其进行重构,以使其以在命令行上训练应用程序参数(以及--job-dir);例如像这样:

The model currently obtains its training data by passing filesnames such as my_model.train and my_model.eval into tf.data.TextLineDataset. These filenames are now hardcoded in the model's trainer, but I plan to refactor it such that it obtains them as training application parameters (along with --job-dir) on the command line instead; e.g. like so:

my_trainer.pl --job-dir job \
  --filename-train my_model.train --filename-eval my_model.eval

然后,这还应该允许我使用Clould ML Engine在本地运行培训师:

This should then also allow me to run the trainer with Clould ML Engine locally:

gcloud ml-engine local train \
  --job-dir job
  ...
  -- \
  --filename-train my_model.train \
  --filename-eval my_model.eval

到目前为止,我是否做出了正确的假设,还可以通过将本地文件名替换为Google Cloud Storage gs: URI在Google的云中运行同一培​​训师(将数据集文件上传到my_bucket之后).像这样:

Am I making correct assumptions so far and could I also run the same trainer in Google's cloud (after uploading my dataset files into my_bucket) by replacing local filenames with Google Cloud Storage gs: URIs e.g. like so :

gcloud ml-engine local train \
  --job-dir job
  ...
  -- \
  --filename-train gs://my_bucket/my_model.train \
  --filename-eval gs://my_bucket/my_model.eval

在其他世界中,tf.data.TextLineDataset是否可以透明地将gs: URI作为文件名"处理,还是我必须在培训师中包含特殊代码才能事先处理此类URI?

I other worlds, can tf.data.TextLineDataset handle gs: URIs as "filenames" transparently, or do I have to include special code in my trainer for processing such URIs beforehand?

推荐答案

是的,tf.read_file和tf.TextLineReader以及tf.data.TextLineDataset都隐式处理GCS.只要确保您将gs://my_bucket/path/to/data.csv的GCS URL作为文件名"传递

Yes, tf.read_file and tf.TextLineReader and tf.data.TextLineDataset all handle GCS implicitly. Just make sure you pass in GCS URLs of the gs://my_bucket/path/to/data.csv as the "filename"

要注意的一件事:始终使用os.path.join()组合目录"名称和文件"名称.虽然大多数Linux发行版通过忽略重复的斜杠来处理/some/path//somefile.txt之类的路径,但GCS(作为键值存储)认为它不同于/some/path/somefile.txt.因此,请使用os.path.join来确保您不重复目录分隔符.

One thing to be careful about: always use os.path.join() to combine "directory" names and "file" names. While most Linux distributions handle paths like /some/path//somefile.txt by ignoring the repeated slash, GCS (being a key-value store) considers it different from /some/path/somefile.txt. So, use os.path.join to make sure you are not repeating directory separators.

这篇关于Google Cloud ML Engine培训师是否必须明确知道Google Cloud Storage URI?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆