Python + Beam + Flink [英] Python + Beam + Flink
问题描述
我一直在尝试让 Apache Beam 可移植性框架与 Python 和 Apache Flink 一起工作,但我似乎找不到一套完整的说明来使环境正常工作.是否有任何参考资料包含使简单的 Python 管道正常工作的先决条件和步骤的完整列表?
总体而言,对于本地便携式运行器 (ULR),参见维基,引用自那里:
运行 Python-SDK 管道:
- 将容器编译为本地构建:
./gradlew :beam-sdks-python-container:docker
- 启动 ULR 作业服务器,例如:
./gradlew :beam-runners-reference-job-server:run -PlogLevel=debug -PvendorLogLevel=warning
.有关详细信息,请参阅上述链接中的 Java 部分.3 正确设置python环境.更多详情可以在这里找到. - 使用以下命令运行管道(在文件夹 sdk/python 下),
示例:
python -m apache_beam.examples.wordcount\--input=gs://dataflow-samples/shakespeare/kinglear.txt \--output=/tmp/output \--runner=PortableRunner \--job_endpoint=本地主机:8099 \--experiments beam_fn_api
对于 Flink,您需要使用不同的作业服务器:./gradlew beam-runners-flink_2.11-job-server:runShadow
.主机:端口是 localhost:8099
,
相关电子邮件讨论:>, p>> I've been trying to get the Apache Beam Portability Framework to work with Python and Apache Flink and I can't seem to find a complete set of instructions to get the environment working. Are there any references with complete list of prerequisites and steps to get a simple python pipeline working? Overall, for local portable runner (ULR), see the wiki, quote from there: Run a Python-SDK Pipeline: example: For Flink you need to use a different job server: Relevant email discussions: one, two. Possibly worth looking at some code: one, two. 这篇关于Python + Beam + Flink的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
./gradlew :beam-sdks-python-container:docker
./gradlew :beam-runners-reference-job-server:run -PlogLevel=debug -PvendorLogLevel=warning
. For details see the Java section in the above link.
3 Set up python environment properly. More details can be found here. python -m apache_beam.examples.wordcount\
--input=gs://dataflow-samples/shakespeare/kinglear.txt \
--output=/tmp/output \
--runner=PortableRunner \
--job_endpoint=localhost:8099 \
--experiments beam_fn_api
./gradlew beam-runners-flink_2.11-job-server:runShadow
. The host:port is localhost:8099
,