为什么我会遇到“错误同步 pod"?使用数据流管道? [英] Why did I encounter an "Error syncing pod" with Dataflow pipeline?

查看:42
本文介绍了为什么我会遇到“错误同步 pod"?使用数据流管道?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我想使用 PyPI 中的特定库时,我的 Dataflow 管道出现了一个奇怪的错误.

I experiment a weird error with my Dataflow pipeline when I want to use specific library from PyPI.

我需要在 ParDo 中使用 jsonschema,因此,在我的 requirements.txt 文件中,我添加了 jsonschema==3.2.0.我使用以下命令行启动我的管道:

I need jsonschema in a ParDo, so, in my requirements.txtfile, I added jsonschema==3.2.0. I launch my pipeline with the command line below:

python -m gcs_to_all 
    --runner DataflowRunner 
    --project <my-project-id> 
    --region europe-west1 
    --temp_location gs://<my-bucket-name>/temp/ 
    --input_topic "projects/<my-project-id>/topics/<my-topic>" 
    --network=<my-network> 
    --subnetwork=<my-subnet> 
    --requirements_file=requirements.txt 
    --experiments=allow_non_updatable_job 
    --streaming  

在终端中,一切似乎都很好:

In the terminal, all seems to be good:

INFO:root:2020-01-03T09:18:35.569Z: JOB_MESSAGE_BASIC: Worker configuration: n1-standard-4 in europe-west1-b.
INFO:root:2020-01-03T09:18:35.806Z: JOB_MESSAGE_WARNING: The network default doesn't have rules that open TCP ports 12345-12346 for internal connection with other VMs. Only rules with a target tag 'dataflow' or empty target tags set apply. If you don't specify such a rule, any pipeline with more than one worker that shuffles data will hang. Causes: Firewall rules associated with your network don't open TCP ports 12345-12346 for Dataflow instances. If a firewall rule opens connection in these ports, ensure target tags aren't specified, or that the rule includes the tag 'dataflow'.
INFO:root:2020-01-03T09:18:48.549Z: JOB_MESSAGE_DETAILED: Workers have started successfully.

Dataflow 网页上的日志选项卡中没有错误,但在堆栈驱动程序中:

Where's no error in the log tab on Dataflow webpage, but in stackdriver:

message: "Error syncing pod 6515c378c6bed37a2c0eec1fcfea300c ("<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"), skipping: [failed to "StartContainer" for "sdk0" with CrashLoopBackOff: "Back-off 10s restarting failed container=sdk0 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)""
message: ", failed to "StartContainer" for "sdk1" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk1 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"" 
message: ", failed to "StartContainer" for "sdk2" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk2 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"" 
message: ", failed to "StartContainer" for "sdk3" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk3 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"" 

我也发现了这个错误(在信息模式下):

I find this error too (in info mode):

Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
  Installing build dependencies: started
Looking in links: /var/opt/google/staged
  Installing build dependencies: started
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
  Installing build dependencies: started
Looking in links: /var/opt/google/staged
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
  Installing build dependencies: started
  Installing build dependencies: finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python3 /usr/local/lib/python3.7/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-mdurhav9/overlay --no-warn-script-location --no-binary :none: --only-binary :none: --no-index --find-links /var/opt/google/staged -- 'setuptools>=40.6.0' wheel
       cwd: None
  Complete output (5 lines):
  Looking in links: /var/opt/google/staged
  Collecting setuptools>=40.6.0
  Collecting wheel
    ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)
  ERROR: No matching distribution found for wheel

但我不知道为什么它会得到这种依赖...

But I don't know why it can get this dependency...

你知道我该如何调试吗?或者为什么我会遇到这个错误?

Do you have any idea how I can debug this? or why I encounter this error?

谢谢

推荐答案

Dataflow Worker 启动时,会执行以下几个步骤:

When Dataflow workers start, they execute several steps:

  1. requirements.txt
  2. 安装包
  3. 安装指定为 extra_packages
  4. 的包
  5. 安装工作流 tarball 并执行 setup.py 中提供的操作.
  1. Install packages from requirements.txt
  2. Install packages specified as extra_packages
  3. Install workflow tarball and execute actions provided in setup.py.

Error syncing podCrashLoopBackOff 消息可能与依赖冲突有关.您需要验证与用于作业的库和版本没有冲突.请参阅文档,了解暂存管道所需的依赖项.

Error syncing pod with CrashLoopBackOff message can be related to dependency conflict. You need to verify that there are no conflicts with the libraries and versions used for the job. Please refer to the documentation for staging required dependencies of the pipeline.

另外,看看 预装依赖关系和这个StackOverflow线程.

您可以尝试更改jsonschema 的版本并再次尝试运行它.如果没有帮助,请提供 requirements.txt 文件.

What you can try is change the version of jsonschema and try run it again. If it wouldn't help, please provide requirements.txt file.

希望能帮到你.

这篇关于为什么我会遇到“错误同步 pod"?使用数据流管道?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆