带有python flex模板的数据流 - 启动器超时 [英] Dataflow with python flex template - launcher timeout

查看:34
本文介绍了带有python flex模板的数据流 - 启动器超时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 flex 模板运行我的 python 数据流作业.当我使用直接运行器(没有 flex 模板)运行时,作业在本地运行良好,但是当我尝试使用 flex 模板运行它时,作业卡在排队"中状态一段时间,然后超时失败.

I'm trying to run my python dataflow job with flex template. job works fine locally when I run with direct runner (without flex template) however when I try to run it with flex template, job stuck in "Queued" status for a while and then fail with timeout.

这是我在 GCE 控制台中找到的一些日志:

Here is some of logs I found in GCE console:

INFO:apache_beam.runners.portability.stager:Executing command: ['/usr/local/bin/python', '-m', 'pip', 'download', '--dest', '/tmp/dataflow-requirements-cache', '-r', '/dataflow/template/requirements.txt', '--exists-action', 'i', '--no-binary', ':all:'

Shutting down the GCE instance, launcher-202011121540156428385273524285797, used for launching.

Timeout in polling result file: gs://my_bucket/staging/template_launches/2020-11-12_15_40_15-6428385273524285797/operation_result.
Possible causes are:
1. Your launch takes too long time to finish. Please check the logs on stackdriver.
2. Service my_service_account@developer.gserviceaccount.com may not have enough permissions to pull container image gcr.io/indigo-computer-272415/samples/dataflow/streaming-beam-py:latest or create new objects in gs://my_bucket/staging/template_launches/2020-11-12_15_40_15-6428385273524285797/operation_result.
3. Transient errors occurred, please try again.

对于 1,我看不到有用的 lo.对于 2,服务帐户是默认服务帐户,因此应具有所有权限.

For 1, I see no useful lo. For 2, service account is default service account so it should all permissions.

我该如何进一步调试?

这是我的 Docker 文件:

Here is my Docker file:

FROM gcr.io/dataflow-templates-base/python3-template-launcher-base

ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}

ADD localdeps localdeps
COPY requirements.txt .
COPY main.py .
COPY setup.py .
COPY bq_field_pb2.py .
COPY bq_table_pb2.py .
COPY core_pb2.py .

ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/main.py"
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py"

RUN pip install -U  --no-cache-dir -r ./requirements.txt

我正在遵循本指南 - https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates

I'm following this guide - https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates

推荐答案

可以在 requirements.txt 文件中找到此问题的可能原因.如果您尝试在需求文件中安装 apache-beam,则 flex 模板将遇到您所描述的确切问题:作业在 Queued 状态中停留一段时间,最后因 Timeout in polling 而失败结果.

A possible cause of this issue can be found within the requirements.txt file. If you are trying to install apache-beam within the requirements file the flex template will experience the exact issue you are describing: Jobs stay some time in the Queued state and finally fail with Timeout in polling result.

原因是,它们受到这个问题的影响.这仅影响弹性模板,作业在本地或使用标准模板正常运行.

The reason being, they are affected by this issue. This only affects flex templates, the jobs run properly locally or with Standard Templates.

解决办法是单独安装在Dockerfile中.

The solution is to install it separately in the Dockerfile.

RUN pip install -U apache-beam==<your desired version>
RUN pip install -U -r ./requirements.txt

这篇关于带有python flex模板的数据流 - 启动器超时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆