使用 TPU 的 TensorFlow 对象检测训练错误 [英] TensorFlow object detection training error with TPU

查看:98
本文介绍了使用 TPU 的 TensorFlow 对象检测训练错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在关注 Google 在 TPU 帖子上的对象检测,但在训练方面遇到了障碍.

查看作业日志,我可以看到 ml-engine 为各种软件包运行了大量 pip 安装,配置了 TPU,然后提交了以下内容:

运行命令:python -m object_detection.model_tpu_main--model_dir=gs://{MY_BUCKET}/train --tpu_zone us-central1--pipeline_config_path=gs://{MY_BUCKET}/data/pipeline.config--job-dir gs://{MY_BUCKET}/train

然后错误:

message: "回溯(最近一次调用最后一次):文件/usr/lib/python2.7/runpy.py",第 174 行,在 _run_module_as_main"__main__", fname, loader, pkg_name)文件/usr/lib/python2.7/runpy.py",第 72 行,在 _run_code 中run_globals 中的执行代码文件/root/.local/lib/python2.7/site-packages/object_detection/model_tpu_main.py",第30行,在<module>从 object_detection 导入 model_lib文件/root/.local/lib/python2.7/site-packages/object_detection/model_lib.py",第26行,在<module>从 object_detection 导入 eval_util文件/root/.local/lib/python2.7/site-packages/object_detection/eval_util.py",第28行,在<module>从 object_detection.metrics 导入 coco_evaluation文件/root/.local/lib/python2.7/site-packages/object_detection/metrics/coco_evaluation.py",第20行,在<module>从 object_detection.metrics 导入 coco_tools文件/root/.local/lib/python2.7/site-packages/object_detection/metrics/coco_tools.py",第47行,在<module>从 pycocotools 进口可可文件/root/.local/lib/python2.7/site-packages/pycocotools/coco.py",第 49 行import matplotlibnmatplotlib.use('Agg')nimport matplotlib.pyplot as plt^语法错误:无效语法"

这是我第一次使用 ml-engine,我被卡住了.我发现错误引用了 python2.7 很奇怪,因为我在 python3.6 环境中从我的笔记本电脑提交了作业.

关于从这里去哪里或做什么的任何想法?

解决方案

根据堆栈跟踪,三行不同的代码不知何故落在同一行(第 49 行).我相信我最近在使用新的 Tensorflow 对象检测 API 时遇到了同样的问题,问题出在 models/research/object_detection/dataset_tools/create_pycocotools_package.sh,特别是以下行:>

sed "s/import matplotlib\.pyplot as plt/import matplotlib\nmatplotlib\.use\(\'Agg\'\)\nimport matplotlib\.pyplot as plt/g" pycocotools/coco.py>coco.py.updated

我的问题是无法识别换行符,我通过使用如下文字换行符解决了这个问题:

sed "s/import matplotlib\.pyplot as plt/import matplotlib\\matplotlib\.use\(\'Agg\'\)\\导入 matplotlib\.pyplot 作为 plt/g" pycocotools/coco.py > coco.py.updated

希望这会有所帮助.

I'm following along with Google's object detection on a TPU post and have hit a wall when it comes to training.

Looking at the job logs, I can see that ml-engine runs a ton of pip installs for various packages, provisions a TPU, and then submits the following:

Running command: python -m object_detection.model_tpu_main 
--model_dir=gs://{MY_BUCKET}/train --tpu_zone us-central1 
--pipeline_config_path=gs://{MY_BUCKET}/data/pipeline.config 
--job-dir gs://{MY_BUCKET}/train

It then errors with:

message:  "Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/object_detection/model_tpu_main.py", line 30, in <module>
from object_detection import model_lib
File "/root/.local/lib/python2.7/site-packages/object_detection/model_lib.py", line 26, in <module>
from object_detection import eval_util
File "/root/.local/lib/python2.7/site-packages/object_detection/eval_util.py", line 28, in <module>
from object_detection.metrics import coco_evaluation
File "/root/.local/lib/python2.7/site-packages/object_detection/metrics/coco_evaluation.py", line 20, in <module>
from object_detection.metrics import coco_tools
File "/root/.local/lib/python2.7/site-packages/object_detection/metrics/coco_tools.py", line 47, in <module>
from pycocotools import coco
File "/root/.local/lib/python2.7/site-packages/pycocotools/coco.py", 
line 49
import matplotlibnmatplotlib.use('Agg')nimport matplotlib.pyplot as plt
                                ^
SyntaxError: invalid syntax
"   

This is my first time using ml-engine and I'm stuck. I find it odd that the error references python2.7, as I submitted the job from my laptop in a python3.6 environment.

Any ideas on where to go from here or what to do?

解决方案

Based on the stack trace, three different lines of code somehow fell on the same line (line 49). I believe I've encountered the same problem recently playing with the new Tensorflow object detection API, and the problem was in models/research/object_detection/dataset_tools/create_pycocotools_package.sh, specifically the following line:

sed "s/import matplotlib\.pyplot as plt/import matplotlib\nmatplotlib\.use\(\'Agg\'\)\nimport matplotlib\.pyplot as plt/g" pycocotools/coco.py > coco.py.updated

The problem for me was that the new-line characters weren't recognized, and I solved it by using literal new lines like the following:

sed "s/import matplotlib\.pyplot as plt/import matplotlib\\ matplotlib\.use\(\'Agg\'\)\\ import matplotlib\.pyplot as plt/g" pycocotools/coco.py > coco.py.updated

Hope this helps.

这篇关于使用 TPU 的 TensorFlow 对象检测训练错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆