在google-cloud-ml作业中加载numpy数组 [英] Load numpy array in google-cloud-ml job

查看:81
本文介绍了在google-cloud-ml作业中加载numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我要启动的模型中,我有一些必须使用特定值初始化的变量.

In the model I want to launch, I have some variables which have to be initialized with specific values.

我目前将这些变量存储到numpy数组中,但是我不知道如何修改我的代码以使其能够在google-cloud-ml作业上正常工作.

I currently store these variables into numpy arrays but I don't know how to adapt my code to make it work on a google-cloud-ml job.

当前,我像这样初始化变量:

Currently I initialize my variable like this:

my_variable = variables.model_variable('my_variable', shape=None, dtype=tf.float32, initializer=np.load('datasets/real/my_variable.npy'))

有人可以帮助我吗?

推荐答案

首先,您需要在GCS上复制/存储数据(使用例如gsutil),并确保您的训练脚本可以访问该存储桶.这样做最简单的方法是将阵列复制到与数据相同的存储桶中,因为您可能已经配置了该存储桶以进行读取访问.如果存储桶与您的培训工作在同一个项目中,并且您遵循了这些说明(尤其是gcloud beta ml init-project),应该已设置好.如果数据将存储在另一个存储桶中,请参见这些说明.

First, you'll need to copy/store the data on GCS (using, e.g., gsutil) and ensure your training script has access to that bucket. The easiest way to do so is to copy the array to the same bucket as your data, since you'll likely already have configured that bucket for read access. If the bucket is in the same project as your training job and you followed these instructions (particularly, gcloud beta ml init-project), you should be set. If the data will be in another bucket, see these instructions.

然后,您将需要使用能够从GCS加载数据的库.尽管您可以自由使用任何可从GCS读取的客户端库,但Tensorflow包含一个可以执行此操作的模块.这是使用TensorFlow的file_io模块的示例:

Then you'll need to use a library capable of loading data from GCS. Tensorflow includes a module that can do this, although you're free to use any client library that can read from GCS. Here's an example of using TensorFlow's file_io module:

from StringIO import StringIO
import tensorflow as tf
import numpy as np
from tensorflow.python.lib.io import file_io

# Create a variable initialized to the value of a serialized numpy array
f = StringIO(file_io.read_file_to_string('gs://my-bucket/123.npy'))
my_variable = tf.Variable(initial_value=np.load(f), name='my_variable')

请注意,我们必须将文件读取为字符串并使用StringIO,因为file_io.FileIO不能完全实现numpy.load所需的查找功能.

Note that we have to read the file into a string and use StringIO, since file_io.FileIO does not fully implement the seek function required by numpy.load.

奖金:如果有用,您可以使用file_io模块将numpy数组直接存储到GCS,例如:

Bonus: in case it's useful, you can directly store a numpy array to GCS using the file_io module, e.g.:

np.save(file_io.FileIO('gs://my-bucket/123', 'w'), np.array([[1,2,3], [4,5,6]]))

对于Python 3,请使用from io import StringIO而不是from StringIO import StringIO.

For Python 3, use from io import StringIO instead of from StringIO import StringIO.

这篇关于在google-cloud-ml作业中加载numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆