在Kubernetes中运行Apache Beam python管道 [英] Running Apache Beam python pipelines in Kubernetes

查看：105 发布时间：2021/4/7 20:55:06 python kubernetes apache-flink apache-beam

本文介绍了在Kubernetes中运行Apache Beam python管道的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这个问题似乎与我正在尝试使用flink在Kubernetes的脱机实例上运行Apache Beam python管道.但是，由于我的用户代码具有外部依赖关系，因此我将Python SDK套件用作外部服务-这会导致错误(如下所述).

I am trying to run Apache Beam python pipeline using flink on an offline instance of Kubernetes. However, since I have user code with external dependencies, I am using the Python SDK harness as an External Service - which is causing errors (described below).

我用来启动Beam python SDK的kubernetes清单:

The kubernetes manifest I use to launch the beam python SDK:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: beam-sdk
spec:
  replicas: 1
  selector:
    matchLabels:
      app: beam
      component: python-beam-sdk
  template:
    metadata:
      labels:
        app: beam
        component: python-beam-sdk
    spec:
      hostNetwork: True
      containers:
      - name: python-beam-sdk
        image: apachebeam/python3.7_sdk:latest
        imagePullPolicy: "Never"
        command: ["/opt/apache/beam/boot", "--worker_pool"]
        ports:
        - containerPort: 50000
          name: yay

apiVersion: v1
kind: Service
metadata:
  name: beam-python-service
spec:
  type: NodePort
  ports:
  - name: yay
    port: 50000
    targetPort: 50000
  selector:
    app: beam
    component: python-beam-sdk

当我使用以下选项启动管道时:

When I launch my pipeline with the following options:

beam_options = PipelineOptions([
    "--runner=FlinkRunner",
    "--flink_version=1.9",
    "--flink_master=10.101.28.28:8081",
    "--environment_type=EXTERNAL",
    "--environment_config=10.97.176.105:50000",
    "--setup_file=./setup.py"
])

我收到以下错误消息(在python sdk服务中):

I get the following error message (within the python sdk service):

NAME                                 READY   STATUS    RESTARTS   AGE
beam-sdk-666779599c-w65g5            1/1     Running   1          4d20h
flink-jobmanager-74d444cccf-m4g8k    1/1     Running   1          4d20h
flink-taskmanager-5487cc9bc9-fsbts   1/1     Running   2          4d20h
flink-taskmanager-5487cc9bc9-zmnv7   1/1     Running   2          4d20h
(base) [~]$ sudo kubectl logs -f beam-sdk-666779599c-w65g5                                                                                                                   
2020/02/26 07:56:44 Starting worker pool 1: python -m apache_beam.runners.worker.worker_pool_main --service_port=50000 --container_executable=/opt/apache/beam/boot
Starting worker with command ['/opt/apache/beam/boot', '--id=1-1', '--logging_endpoint=localhost:39283', '--artifact_endpoint=localhost:41533', '--provision_endpoint=localhost:42233', '--control_endpoint=localhost:44977']
2020/02/26 09:09:07 Initializing python harness: /opt/apache/beam/boot --id=1-1 --logging_endpoint=localhost:39283 --artifact_endpoint=localhost:41533 --provision_endpoint=localhost:42233 --control_endpoint=localhost:44977
2020/02/26 09:11:07 Failed to obtain provisioning information: failed to dial server at localhost:42233
    caused by:
context deadline exceeded

我不知道什么是日志记录或工件端点(等).通过检查源代码，似乎端点已被硬编码为位于本地主机.

I have no idea what the logging- or artifact endpoint (etc.) is. And by inspecting the source code it seems like that the endpoints has been hard-coded to be located at localhost.

在Kubernetes中运行Apache Beam python管道 [英] Running Apache Beam python pipelines in Kubernetes

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在Kubernetes中运行Apache Beam python管道 [英] Running Apache Beam python pipelines in Kubernetes

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭