无法在Google Cloud中训练我的Tensorflow检测器模型 [英] Can't train my Tensorflow detector model in Google Cloud

查看：140 发布时间：2020/7/26 0:05:27 python tensorflow tensorflow-slim

本文介绍了无法在Google Cloud中训练我的Tensorflow检测器模型的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试基于Tensorflow示例和

I'm trying to train my own Detector model based on Tensorflow sample and this post. And I did succeed on training locally on a Macbook Pro. The problem is that I don't have a GPU and doing it on the CPU is too slow (about 25s per iteration).

这样，我尝试按照教程，但我无法使其正常运行.

This way, I'm trying to run on Google Cloud ML Engine following the tutorial, but I can't make it run properly.

我的文件夹结构如下:

+ data
 - train.record
 - test.record
+ models
 + train
 + eval
+ training
 - ssd_mobilenet_v1_coco

我从本地培训转变为Google Cloud培训的步骤是:

My steps to change from local training to Google Cloud training were:

在Google Cloud存储中创建存储桶，并用文件复制我的本地文件夹结构；
编辑我的pipeline.config文件，并将所有路径从Users/dev/detector/更改为gcc://bucketname/;
使用教程中提供的默认配置创建YAML文件；
运行

Create a bucket in Google Cloud storage and copy my local folder structure with files;
Edit my pipeline.config file and change all paths from Users/dev/detector/ to gcc://bucketname/;
Create a YAML file with the default configuration provided in the tutorial;
Run

gcloud ml-engine作业提交培训object_detection_ date +%s \ --job-dir = gs://bucketname/models/train \ -打包dist/object_detection-0.1.tar.gz，slim/dist/slim-0.1.tar.gz \ --module-name object_detection.train \ --region us-east1 \ --config/用户/dev/detector/training/cloud.yml \ -\ --train_dir = gs://bucketname/models/train \ --pipeline_config_path = gs://bucketname/data/pipeline.config

gcloud ml-engine jobs submit training object_detection_date +%s \ --job-dir=gs://bucketname/models/train \ --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \ --module-name object_detection.train \ --region us-east1 \ --config /Users/dev/detector/training/cloud.yml \ -- \ --train_dir=gs://bucketname/models/train \ --pipeline_config_path=gs://bucketname/data/pipeline.config

这样做，会给我以下MLUnits错误消息:

Doing so, gives me the following error message from the MLUnits:

副本ps 0以非零状态1退出.终止原因:错误.追溯(最近一次调用):文件"/usr/lib/python2.7/runpy.py"，第162行，位于_run_module_as_main"__ main__"中，fname，loader，pkg_name)文件"/usr/lib/python2.7/来自run_globals文件"/root/.local/lib/python2.7/site-packages/object_detection/train.py"的_run_code exec代码中第72行，位于object_detection导入培训器文件中的"/". root/.local/lib/python2.7/site-packages/object_detection/trainer.py，行27，来自object_detection.builders导入preprocessor_builder文件"/root/.local/lib/python2.7/site-packages/来自object_detection.protos的object_detection/builders/preprocessor_builder.py，第21行，导入preprocessor_pb2文件"/root/.local/lib/python2.7/site-packages/object_detection/protos/preprocessor_pb2.py"，第71行，在options = None，file = DESCRIPTOR)，TypeError:__new __()获得了意外的关键字参数'file'

谢谢.

无法在Google Cloud中训练我的Tensorflow检测器模型 [英] Can't train my Tensorflow detector model in Google Cloud

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

无法在Google Cloud中训练我的Tensorflow检测器模型 [英] Can&#39;t train my Tensorflow detector model in Google Cloud

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

无法在Google Cloud中训练我的Tensorflow检测器模型 [英] Can't train my Tensorflow detector model in Google Cloud

登录关闭