在 google-cloud-ml 作业中加载 numpy 数组 [英] Load numpy array in google-cloud-ml job

查看:30
本文介绍了在 google-cloud-ml 作业中加载 numpy 数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我想启动的模型中,我有一些必须用特定值初始化的变量.

In the model I want to launch, I have some variables which have to be initialized with specific values.

我目前将这些变量存储到 numpy 数组中,但我不知道如何调整我的代码以使其适用于 google-cloud-ml 作业.

I currently store these variables into numpy arrays but I don't know how to adapt my code to make it work on a google-cloud-ml job.

目前我像这样初始化我的变量:

Currently I initialize my variable like this:

my_variable = variables.model_variable('my_variable', shape=None, dtype=tf.float32, initializer=np.load('datasets/real/my_variable.npy'))

有人可以帮我吗?

推荐答案

首先,您需要在 GCS 上复制/存储数据(例如,使用 gsutil)并确保您的训练脚本可以访问该存储桶.最简单的方法是将数组复制到与数据相同的存储桶中,因为您可能已经配置了该存储桶以进行读取访问.如果存储桶与您的训练作业在同一个项目中,并且您遵循了 这些说明(特别是gcloud beta ml init-project),你应该设置.如果数据将在另一个存储桶中,请参阅这些 说明.

First, you'll need to copy/store the data on GCS (using, e.g., gsutil) and ensure your training script has access to that bucket. The easiest way to do so is to copy the array to the same bucket as your data, since you'll likely already have configured that bucket for read access. If the bucket is in the same project as your training job and you followed these instructions (particularly, gcloud beta ml init-project), you should be set. If the data will be in another bucket, see these instructions.

然后您需要使用能够从 GCS 加载数据的库.Tensorflow 包含一个可以执行此操作的模块,尽管您可以自由使用任何可以从 GCS 读取的客户端库.以下是使用 TensorFlow 的 file_io 模块的示例:

Then you'll need to use a library capable of loading data from GCS. Tensorflow includes a module that can do this, although you're free to use any client library that can read from GCS. Here's an example of using TensorFlow's file_io module:

from StringIO import StringIO
import tensorflow as tf
import numpy as np
from tensorflow.python.lib.io import file_io

# Create a variable initialized to the value of a serialized numpy array
f = StringIO(file_io.read_file_to_string('gs://my-bucket/123.npy'))
my_variable = tf.Variable(initial_value=np.load(f), name='my_variable')

请注意,我们必须将文件读入字符串并使用StringIO,因为file_io.FileIO 没有完全实现numpy 所需的seek 功能.加载.

Note that we have to read the file into a string and use StringIO, since file_io.FileIO does not fully implement the seek function required by numpy.load.

奖励:如果有用,您可以使用 file_io 模块直接将 numpy 数组存储到 GCS,例如:

Bonus: in case it's useful, you can directly store a numpy array to GCS using the file_io module, e.g.:

np.save(file_io.FileIO('gs://my-bucket/123', 'w'), np.array([[1,2,3], [4,5,6]]))

对于 Python 3,使用 from io import StringIO 而不是 from StringIO import StringIO.

For Python 3, use from io import StringIO instead of from StringIO import StringIO.

这篇关于在 google-cloud-ml 作业中加载 numpy 数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆