使用 TPU 的 TensorFlow 对象检测训练错误 [英] TensorFlow object detection training error with TPU
问题描述
我正在关注 Google 在 TPU 帖子上的对象检测,但在训练方面遇到了障碍.
查看作业日志,我可以看到 ml-engine 为各种软件包运行了大量 pip 安装,配置了 TPU,然后提交了以下内容:
运行命令:python -m object_detection.model_tpu_main--model_dir=gs://{MY_BUCKET}/train --tpu_zone us-central1--pipeline_config_path=gs://{MY_BUCKET}/data/pipeline.config--job-dir gs://{MY_BUCKET}/train
然后错误:
message: "回溯(最近一次调用最后一次):文件/usr/lib/python2.7/runpy.py",第 174 行,在 _run_module_as_main"__main__", fname, loader, pkg_name)文件/usr/lib/python2.7/runpy.py",第 72 行,在 _run_code 中run_globals 中的执行代码文件/root/.local/lib/python2.7/site-packages/object_detection/model_tpu_main.py",第30行,在<module>从 object_detection 导入 model_lib文件/root/.local/lib/python2.7/site-packages/object_detection/model_lib.py",第26行,在<module>从 object_detection 导入 eval_util文件/root/.local/lib/python2.7/site-packages/object_detection/eval_util.py",第28行,在<module>从 object_detection.metrics 导入 coco_evaluation文件/root/.local/lib/python2.7/site-packages/object_detection/metrics/coco_evaluation.py",第20行,在<module>从 object_detection.metrics 导入 coco_tools文件/root/.local/lib/python2.7/site-packages/object_detection/metrics/coco_tools.py",第47行,在<module>从 pycocotools 进口可可文件/root/.local/lib/python2.7/site-packages/pycocotools/coco.py",第 49 行import matplotlibnmatplotlib.use('Agg')nimport matplotlib.pyplot as plt^语法错误:无效语法"
这是我第一次使用 ml-engine,我被卡住了.我发现错误引用了 python2.7 很奇怪,因为我在 python3.6 环境中从我的笔记本电脑提交了作业.
关于从这里去哪里或做什么的任何想法?
根据堆栈跟踪,三行不同的代码不知何故落在同一行(第 49 行).我相信我最近在使用新的 Tensorflow 对象检测 API 时遇到了同样的问题,问题出在 models/research/object_detection/dataset_tools/create_pycocotools_package.sh
,特别是以下行:>
sed "s/import matplotlib\.pyplot as plt/import matplotlib\nmatplotlib\.use\(\'Agg\'\)\nimport matplotlib\.pyplot as plt/g" pycocotools/coco.py>coco.py.updated
我的问题是无法识别换行符,我通过使用如下文字换行符解决了这个问题:
sed "s/import matplotlib\.pyplot as plt/import matplotlib\\matplotlib\.use\(\'Agg\'\)\\导入 matplotlib\.pyplot 作为 plt/g" pycocotools/coco.py > coco.py.updated
希望这会有所帮助.
I'm following along with Google's object detection on a TPU post and have hit a wall when it comes to training.
Looking at the job logs, I can see that ml-engine runs a ton of pip installs for various packages, provisions a TPU, and then submits the following:
Running command: python -m object_detection.model_tpu_main
--model_dir=gs://{MY_BUCKET}/train --tpu_zone us-central1
--pipeline_config_path=gs://{MY_BUCKET}/data/pipeline.config
--job-dir gs://{MY_BUCKET}/train
It then errors with:
message: "Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/object_detection/model_tpu_main.py", line 30, in <module>
from object_detection import model_lib
File "/root/.local/lib/python2.7/site-packages/object_detection/model_lib.py", line 26, in <module>
from object_detection import eval_util
File "/root/.local/lib/python2.7/site-packages/object_detection/eval_util.py", line 28, in <module>
from object_detection.metrics import coco_evaluation
File "/root/.local/lib/python2.7/site-packages/object_detection/metrics/coco_evaluation.py", line 20, in <module>
from object_detection.metrics import coco_tools
File "/root/.local/lib/python2.7/site-packages/object_detection/metrics/coco_tools.py", line 47, in <module>
from pycocotools import coco
File "/root/.local/lib/python2.7/site-packages/pycocotools/coco.py",
line 49
import matplotlibnmatplotlib.use('Agg')nimport matplotlib.pyplot as plt
^
SyntaxError: invalid syntax
"
This is my first time using ml-engine and I'm stuck. I find it odd that the error references python2.7, as I submitted the job from my laptop in a python3.6 environment.
Any ideas on where to go from here or what to do?
Based on the stack trace, three different lines of code somehow fell on the same line (line 49). I believe I've encountered the same problem recently playing with the new Tensorflow object detection API, and the problem was in models/research/object_detection/dataset_tools/create_pycocotools_package.sh
, specifically the following line:
sed "s/import matplotlib\.pyplot as plt/import matplotlib\nmatplotlib\.use\(\'Agg\'\)\nimport matplotlib\.pyplot as plt/g" pycocotools/coco.py > coco.py.updated
The problem for me was that the new-line characters weren't recognized, and I solved it by using literal new lines like the following:
sed "s/import matplotlib\.pyplot as plt/import matplotlib\\
matplotlib\.use\(\'Agg\'\)\\
import matplotlib\.pyplot as plt/g" pycocotools/coco.py > coco.py.updated
Hope this helps.
这篇关于使用 TPU 的 TensorFlow 对象检测训练错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!