如何为Cloud ML Engine打包词汇表文件 [英] How to package vocabulary file for Cloud ML Engine

查看:159
本文介绍了如何为Cloud ML Engine打包词汇表文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个.txt文件,每行包含不同的标签。我使用这个文件创建一个标签索引查找文件,例如:

  label_index = tf.contrib.lookup.index_table_from_file(vocabulary_file ='labels.txt'

我想知道如何将词汇表文件与我的云打包ml-引擎?包装建议在如何使用设置.py文件,但我不完全确定我应该在哪里放置相关的.txt文件,它们是否应该托管在引擎可以访问的存储桶(即.gs://)中,或者可以将它们打包与教练不知何故?

解决方案

您有多个选项。我认为最直接的方法是存储标签.txt 在GCS位置。



但是,如果您愿意,也可以将文件打包到 setup.py 。有多种方法可以做到,所以我会请参阅官方setuptools文档。 p>

让我看看一个简单的例子:

创建 setup.py 在您的培训包下面的目录中(在CloudML Engine的示例中通常称为 trainer ),所以我将继续进行,就好像您的代码结构与样本一样,包括使用 trainer 作为包)。以下内容基于您引用的文档一个重要的变化,即 package_data 参数而不是 include_package_data

  from setuptools import find_packages $ b $ from setuptools import setup 

setup(
name ='my_model',
version ='0.1',
install_requires = REQUIRED_PACKAGES,
packages = find_packages(),
package_data = {'trainer':['labels.txt']},
description =' '

如果您运行 python setup.py sdist ,您可以看到 trainer / labels.txt 被复制到了tarball中。



然后在你的代码中,你可以像这样访问文件:

$ p
$ b

  from pkg_resources import Requirement,resource_filename 
resource_filename(要求ment.parse('trainer'),'labels.txt')

请注意,要运行此代码在本地,你将不得不安装你的软件包: python setup.py install [--user]



<这是我认为将文件存储在GCS上的主要原因可能更容易。


I have a .txt file which contains a different label on each line. I use this file to create a label index lookup file, for example:

label_index = tf.contrib.lookup.index_table_from_file(vocabulary_file = 'labels.txt'

I am wondering how I should package the vocabulary file with my cloud ml-engine? The packaging suggestions are explicit in how to set up the .py files but I am not entirely sure where I should put relevant .txt files. Should they just be hosted in a storage bucket (ie. gs://) that the engine has access to, or can they be packaged with the trainer somehow?

解决方案

You have multiple options. I think the most straightforward is to store labels.txt in a GCS location.

However, if you prefer, you can also package the file up in your setup.py. There are multiple ways to do this, so I'll refer you to the official setuptools documentation.

Let me walk through a quick example:

Create a setup.py in the directory below your training package (often called trainer in CloudML Engine's samples, so I will proceed as if you're code is structured the same as the samples, including using trainer as the package). The following is based on the docs you referenced with one important change, namely, the package_data argument instead of include_package_data:

from setuptools import find_packages
from setuptools import setup

setup(
    name='my_model',
    version='0.1',
    install_requires=REQUIRED_PACKAGES,
    packages=find_packages(),
    package_data={'trainer': ['labels.txt']},
    description='My trainer application package.'
)

If you run python setup.py sdist, you can see that trainer/labels.txt was copied into the tarball.

Then in your code, you can access the file like this:

from pkg_resources import Requirement, resource_filename
resource_filename(Requirement.parse('trainer'),'labels.txt')

Note that to run this code locally, you're going to have to install your package: python setup.py install [--user].

And that's the primary reason I think storing the file on GCS might be easier.

这篇关于如何为Cloud ML Engine打包词汇表文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆