Google Cloud Dataflow卡住了 [英] Google Cloud Dataflow Stuck

查看:80
本文介绍了Google Cloud Dataflow卡住了的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,在运行以Python编写的数据流作业时,我一直收到此错误.问题是它曾经可以正常工作,并且代码没有更改,因此我认为它与env有关.

Recently I've been getting this error when running dataflow jobs written in Python. The thing is it used to work and no code has changed so I'm thinking it's got something to do with the env.

在同步吊舱d557f64660a131e09d2acb9478fad42f(")时出错,跳过: 使用CrashLoopBackOff无法为"python"的"StartContainer": 后退20秒钟,重新启动失败的container = python pod = dataflow-)

Error syncing pod d557f64660a131e09d2acb9478fad42f (""), skipping: failed to "StartContainer" for "python" with CrashLoopBackOff: "Back-off 20s restarting failed container=python pod=dataflow-)

有人可以帮我吗?

推荐答案

在我的情况下,我使用的Apache Beam SDK版本2.9.0存在相同的问题.

In my case, I was using Apache Beam SDK version 2.9.0 had the same problem.

我使用了setup.py,并且通过加载requirements.txt文件的内容来动态填充设置字段" install_requires ".可以,如果您使用DirectRunner,但是DataflowRunner对于本地文件的依赖关系过于敏感,因此放弃该技术并将依赖关系从requirements.txt硬编码为"install_requires"对我来说解决了一个问题.

I used setup.py and set-up field "install_requires" was filled dynamically by loading content of requirements.txt file. It’s okay if you’re using DirectRunner but DataflowRunner is too sensitive for dependencies on local files, so abandoning that technique and hard-coding dependencies from requirements.txt into "install_requires" solved an issue for me.

如果您坚持这样做,请尝试调查您的依赖关系并将其最小化.请参阅管理Python管道依赖项文档主题以获取帮助.避免在本地文件系统上使用复杂或嵌套的代码结构或依赖项.

If you stuck on that try to investigate your dependencies and minimize them as much as you can. Please refer to the Managing Python Pipeline Dependencies documentation topic for help. Avoid using complex or nested code-structures or dependencies on the local filesystem.

这篇关于Google Cloud Dataflow卡住了的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆