Python + Beam + Flink [英] Python + Beam + Flink

查看:353
本文介绍了Python + Beam + Flink的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试使Apache Beam可移植性框架与Python和Apache Flink一起使用,但我似乎找不到完整的说明来使环境正常运行.是否有参考资料完整列出了前提条件和步骤,以使简单的python管道正常工作?

I've been trying to get the Apache Beam Portability Framework to work with Python and Apache Flink and I can't seem to find a complete set of instructions to get the environment working. Are there any references with complete list of prerequisites and steps to get a simple python pipeline working?

推荐答案

总体上,对于本地便携式运行器(ULR),

Overall, for local portable runner (ULR), see the wiki, quote from there:

运行Python-SDK管道:

Run a Python-SDK Pipeline:

  1. 将容器编译为本地版本:./gradlew :beam-sdks-python-container:docker
  2. 启动ULR作业服务器,例如:./gradlew :beam-runners-reference-job-server:run -PlogLevel=debug -PvendorLogLevel=warning.有关详细信息,请参见上面链接中的Java部分. 3正确设置python环境.可以在此处找到的更多详细信息.
  3. 使用以下命令(在sdk/python文件夹下)运行管道
  1. Compile container as a local build: ./gradlew :beam-sdks-python-container:docker
  2. Start ULR job server, for example: ./gradlew :beam-runners-reference-job-server:run -PlogLevel=debug -PvendorLogLevel=warning . For details see the Java section in the above link. 3 Set up python environment properly. More details can be found here.
  3. Run pipeline by using following (under folder sdk/python),

示例:

python -m apache_beam.examples.wordcount\
  --input=gs://dataflow-samples/shakespeare/kinglear.txt \
  --output=/tmp/output \
  --runner=PortableRunner \
  --job_endpoint=localhost:8099 \
  --experiments beam_fn_api

对于Flink,您需要使用其他作业服务器:./gradlew beam-runners-flink_2.11-job-server:runShadow. host:port是localhost:8099

For Flink you need to use a different job server: ./gradlew beam-runners-flink_2.11-job-server:runShadow. The host:port is localhost:8099,

相关电子邮件讨论: one 两个.

Relevant email discussions: one, two.

可能值得看一些代码:两个.

Possibly worth looking at some code: one, two.

这篇关于Python + Beam + Flink的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆