Python + Beam + Flink [英] Python + Beam + Flink
问题描述
我一直在尝试使Apache Beam可移植性框架与Python和Apache Flink一起使用,但我似乎找不到完整的说明来使环境正常运行.是否有参考资料完整列出了前提条件和步骤,以使简单的python管道正常工作?
I've been trying to get the Apache Beam Portability Framework to work with Python and Apache Flink and I can't seem to find a complete set of instructions to get the environment working. Are there any references with complete list of prerequisites and steps to get a simple python pipeline working?
推荐答案
Overall, for local portable runner (ULR), see the wiki, quote from there:
运行Python-SDK管道:
Run a Python-SDK Pipeline:
- 将容器编译为本地版本:
./gradlew :beam-sdks-python-container:docker
- 启动ULR作业服务器,例如:
./gradlew :beam-runners-reference-job-server:run -PlogLevel=debug -PvendorLogLevel=warning
.有关详细信息,请参见上面链接中的Java部分. 3正确设置python环境.可以在此处找到的更多详细信息. - 使用以下命令(在sdk/python文件夹下)运行管道
- Compile container as a local build:
./gradlew :beam-sdks-python-container:docker
- Start ULR job server, for example:
./gradlew :beam-runners-reference-job-server:run -PlogLevel=debug -PvendorLogLevel=warning
. For details see the Java section in the above link. 3 Set up python environment properly. More details can be found here. - Run pipeline by using following (under folder sdk/python),
示例:
python -m apache_beam.examples.wordcount\
--input=gs://dataflow-samples/shakespeare/kinglear.txt \
--output=/tmp/output \
--runner=PortableRunner \
--job_endpoint=localhost:8099 \
--experiments beam_fn_api
对于Flink,您需要使用其他作业服务器:./gradlew beam-runners-flink_2.11-job-server:runShadow
. host:port是localhost:8099
,
For Flink you need to use a different job server: ./gradlew beam-runners-flink_2.11-job-server:runShadow
. The host:port is localhost:8099
,
Relevant email discussions: one, two.
可能值得看一些代码:两个.
Possibly worth looking at some code: one, two.
这篇关于Python + Beam + Flink的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!