Python + Beam + Flink [英] Python + Beam + Flink

查看:25
本文介绍了Python + Beam + Flink的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试让 Apache Beam 可移植性框架与 Python 和 Apache Flink 一起工作,但我似乎找不到一套完整的说明来使环境正常工作.是否有任何参考资料包含使简单的 Python 管道正常工作的先决条件和步骤的完整列表?

解决方案

总体而言,对于本地便携式运行器 (ULR),参见维基,引用自那里:

运行 Python-SDK 管道:

  1. 将容器编译为本地构建:./gradlew :beam-sdks-python-container:docker
  2. 启动 ULR 作业服务器,例如: ./gradlew :beam-runners-reference-job-server:run -PlogLevel=debug -PvendorLogLevel=warning .有关详细信息,请参阅上述链接中的 Java 部分.3 正确设置python环境.更多详情可以在这里找到.
  3. 使用以下命令运行管道(在文件夹 sdk/python 下),

示例:

python -m apache_beam.examples.wordcount\--input=gs://dataflow-samples/shakespeare/kinglear.txt \--output=/tmp/output \--runner=PortableRunner \--job_endpoint=本地主机:8099 \--experiments beam_fn_api

对于 Flink,您需要使用不同的作业服务器:./gradlew beam-runners-flink_2.11-job-server:runShadow.主机:端口是 localhost:8099,

相关电子邮件讨论:>, p>>

可能值得看一些代码:一个两个.

I've been trying to get the Apache Beam Portability Framework to work with Python and Apache Flink and I can't seem to find a complete set of instructions to get the environment working. Are there any references with complete list of prerequisites and steps to get a simple python pipeline working?

解决方案

Overall, for local portable runner (ULR), see the wiki, quote from there:

Run a Python-SDK Pipeline:

  1. Compile container as a local build: ./gradlew :beam-sdks-python-container:docker
  2. Start ULR job server, for example: ./gradlew :beam-runners-reference-job-server:run -PlogLevel=debug -PvendorLogLevel=warning . For details see the Java section in the above link. 3 Set up python environment properly. More details can be found here.
  3. Run pipeline by using following (under folder sdk/python),

example:

python -m apache_beam.examples.wordcount\
  --input=gs://dataflow-samples/shakespeare/kinglear.txt \
  --output=/tmp/output \
  --runner=PortableRunner \
  --job_endpoint=localhost:8099 \
  --experiments beam_fn_api

For Flink you need to use a different job server: ./gradlew beam-runners-flink_2.11-job-server:runShadow. The host:port is localhost:8099,

Relevant email discussions: one, two.

Possibly worth looking at some code: one, two.

这篇关于Python + Beam + Flink的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆