Keras Google Cloudml示例:IndexError [英] Keras google cloudml sample: IndexError

查看:57
本文介绍了Keras Google Cloudml示例:IndexError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试keras cloudml示例( https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census/keras ),我似乎无法进行云培训.使用python和gcloud进行的本地培训似乎进展顺利.

I'm trying the keras cloudml sample (https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census/keras) and I seem unable to run the cloud training. The local training, both with python and gcloud seem to go well.

我一直在寻找关于stackexchange,Google的解决方案,并阅读 https://cloud.google.com/ml-engine/docs/how-tos/troubleshooting ,但我似乎是唯一遇到此问题的人(通常有力的迹象表明故障完全是我的!).除了下面的环境,我还尝试使用python 3.6和tensorflow 1.3并没有成功.

I've looked for a solution on stackexchange, google and read https://cloud.google.com/ml-engine/docs/how-tos/troubleshooting, but I seem to be the only one with this problem (usually a strong indication the fault is entirely mine!) . In addition to the environment below, I've tried with python 3.6 and tensorflow 1.3 with no success.

我是菜鸟,所以我可能以某种基本的方式犯了错误,但我找不到它.

I'm a noob, so I'm probably erring in some basic way, but I cannot spot it.

感谢所有帮助,

:-)

yarc68000.

yarc68000.

-环境-

(env1) $ python --version
Python 2.7.13 :: Continuum Analytics, Inc.
(env1) $ conda list | grep 'h5py\|keras\|pandas\|numexpr\|tensorflow'
h5py                      2.7.1                    py27_1    conda-forge
keras                     2.0.6                    py27_0    conda-forge
numexpr                   2.6.2                    py27_1    conda-forge
pandas                    0.20.3                   py27_0    anaconda
tensorflow                1.2.1                     <pip>
(env1) $ gcloud --version
Google Cloud SDK 172.0.1
alpha 2017.09.15
beta 2017.09.15
bq 2.0.26
core 2017.09.21
datalab 20170818
gcloud 
gsutil 4.27

-----------工作--------

----------- job --------

(env1) $ export BUCKET=gs://j170922census1
(env1) $ gsutil mb $BUCKET
Creating gs://j170922census1/...
(env1) $ export TRAIN_FILE=gs://cloudml-public/census/data/adult.data.csv
(env1) $ export EVAL_FILE=gs://cloudml-public/census/data/adult.test.csv
(env1) $ export JOB_NAME="census_keras_$$"
(env1) $ export TRAIN_STEPS=200
(env1) $ gcloud ml-engine jobs submit training $JOB_NAME --stream-logs --runtime-version 1.2 --job-dir $BUCKET --package-path trainer --module-name trainer.task --region us-central1 -- --train-files $TRAIN_FILE --eval-files $EVAL_FILE --train-steps $TRAIN_STEPS
Job [census_keras_7639] submitted successfully.
INFO    2017-09-22 19:56:56 +0200   service     Validating job requirements...
INFO    2017-09-22 19:56:57 +0200   service     Job creation request has been successfully validated.
INFO    2017-09-22 19:56:57 +0200   service     Job census_keras_7639 is queued.
INFO    2017-09-22 19:56:57 +0200   service     Waiting for job to be provisioned.
INFO    2017-09-22 20:01:39 +0200   service     Waiting for TensorFlow to start.
INFO    2017-09-22 20:02:55 +0200   master-replica-0        Running task with arguments: --cluster={"master": ["master-cc38d44a51-0:2222"]} --task={"type": "master", "index": 0} --job={
<..>
INFO    2017-09-22 20:04:00 +0200   master-replica-0        197/200 [============================>.] - ETA: 0s - loss: 0.6931 - acc: 0.7563
INFO    2017-09-22 20:04:00 +0200   master-replica-0        200/200 [==============================] - 1s - loss: 0.6931 - acc: 0.7600     
INFO    2017-09-22 20:04:00 +0200   master-replica-0        Epoch 10/20
ERROR   2017-09-22 20:04:02 +0200   master-replica-0        Traceback (most recent call last):
ERROR   2017-09-22 20:04:02 +0200   master-replica-0          File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
ERROR   2017-09-22 20:04:02 +0200   master-replica-0            "__main__", fname, loader, pkg_name)
ERROR   2017-09-22 20:04:02 +0200   master-replica-0          File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
ERROR   2017-09-22 20:04:02 +0200   master-replica-0            exec code in run_globals
ERROR   2017-09-22 20:04:02 +0200   master-replica-0          File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 199, in <module>
ERROR   2017-09-22 20:04:02 +0200   master-replica-0            dispatch(**parse_args.__dict__)
ERROR   2017-09-22 20:04:02 +0200   master-replica-0          File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 121, in dispatch
ERROR   2017-09-22 20:04:02 +0200   master-replica-0            callbacks=callbacks)
ERROR   2017-09-22 20:04:02 +0200   master-replica-0          File "/root/.local/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 88, in wrapper
ERROR   2017-09-22 20:04:02 +0200   master-replica-0            return func(*args, **kwargs)
ERROR   2017-09-22 20:04:02 +0200   master-replica-0          File "/root/.local/lib/python2.7/site-packages/keras/models.py", line 1110, in fit_generator
ERROR   2017-09-22 20:04:02 +0200   master-replica-0            initial_epoch=initial_epoch)
ERROR   2017-09-22 20:04:02 +0200   master-replica-0          File "/root/.local/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 88, in wrapper
ERROR   2017-09-22 20:04:02 +0200   master-replica-0            return func(*args, **kwargs)
ERROR   2017-09-22 20:04:02 +0200   master-replica-0          File "/root/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1849, in fit_generator
ERROR   2017-09-22 20:04:02 +0200   master-replica-0            callbacks.on_epoch_begin(epoch)
ERROR   2017-09-22 20:04:02 +0200   master-replica-0          File "/root/.local/lib/python2.7/site-packages/keras/callbacks.py", line 63, in on_epoch_begin
ERROR   2017-09-22 20:04:02 +0200   master-replica-0            callback.on_epoch_begin(epoch, logs)
ERROR   2017-09-22 20:04:02 +0200   master-replica-0          File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 57, in on_epoch_begin
ERROR   2017-09-22 20:04:02 +0200   master-replica-0            census_model = load_model(checkpoints[-1])
ERROR   2017-09-22 20:04:02 +0200   master-replica-0        IndexError: list index out of range
<..>
INFO    2017-09-22 20:04:53 +0200   service     Finished tearing down TensorFlow.
INFO    2017-09-22 20:05:49 +0200   service     Job failed.

推荐答案

在Cloud ML Engine上运行此漏洞时实际上存在一个错误,因为目前在GCS上禁用了检查点(Keras无法将检查点本地写入GCS) .请参阅此 PR ,以获取针对您所面临问题的即时解决方案.还可以查看待定PR ,它可以解决检查点问题并使文件可用关于GCS(无法为Keras编写GCS的解决方法).

There actually was a bug when running this on the Cloud ML Engine because the checkpoints are disabled for now on GCS (Keras can't natively write checkpoints to GCS). See this PR for the immediate fix for the issue you are facing. Also take a look at pending PR which fixes the checkpoint issue and makes files available on GCS (Workaround for the inability to do GCS writes for Keras).

这篇关于Keras Google Cloudml示例:IndexError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆