ML Engine上的Tensorflow:副本母版0退出,其非零状态为1 [英] Tensorflow on ML Engine: The replica master 0 exited with a non-zero status of 1

查看:78
本文介绍了ML Engine上的Tensorflow:副本母版0退出,其非零状态为1的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在ML Engine上启动了一个tensorflow任务,大约2分钟后,我不断收到错误消息"副本主数据0退出,其非零状态为1."

I launch a tensorflow task on ML Engine and after about 2 minutes I keep getting an error message "The replica master 0 exited with a non-zero status of 1."

(该任务在使用ml-engine local时可以正常运行.)

(The task incidentally runs fine with ml-engine local.)

问题:是否可以在任何地方或日志文件中查看有关发生的情况的更多信息?

Question: Is there any place or log file where can I see further information on what happened?

日志查看器仅提供以下内容:

The logs viewer just gives the following:

{
 insertId:  "ibal72g1rxhr63"  
 logName:  "projects/**-***-ml/logs/ml.googleapis.com%2Fcnn180322_170649"  
 receiveTimestamp:  "2018-03-22T17:08:38.344282172Z"  
 resource: {
  labels: {
   job_id:  "cnn180322_170649"    
   project_id:  "**-***-ml"    
   task_name:  "service"    
  }
  type:  "ml_job"   
 }
 severity:  "ERROR"  
 textPayload:  "The replica master 0 exited with a non-zero status of 1."  
 timestamp:  "2018-03-22T17:08:38.344282172Z"  
}

在此先感谢任何指针!

推荐答案

对于明显缺少日志文件的解决方案是缺少写入日志的权限.

The solution to the apparent lack of log files was missing permission to write to the logs.

根据IAM&管理员,为cloud-ml-service@<project_id>.iam.gserviceaccount.com帐户添加了 Logs Writer 角色,从而解决了该问题,并使管理员和工作人员可以按预期将日志消息写入Stackdriver.

Under IAM & admin, adding the Logs Writer role the account cloud-ml-service@<project_id>.iam.gserviceaccount.com solved the problem and enables the master and workers to write log messages to Stackdriver as expected.

有关类似的讨论和其他信息,请参见自从迁移到V2以来,Cloud ML作业不可用的Stackdriver日志

For a similar discussion and some additional information, see Stackdriver logs not available for Cloud ML jobs since migration to V2

感谢大家的投入!

这篇关于ML Engine上的Tensorflow:副本母版0退出,其非零状态为1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆