Google Dataflow的工作流程编排 [英] Workflow orchestration for Google Dataflow

查看:497
本文介绍了Google Dataflow的工作流程编排的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用Google Dataflow进行批处理数据处理,并为工作流程编排工具寻找一些类似于Azkaban为Hadoop所做的工具。

关键在于我们正在寻找的是
$ b $ ul

  • 配置工作流

  • 计划工作流


  • 能够重新运行失败的作业



  • 我们评估过Pentaho,但这些功能在其昂贵的企业版中提供。
    我们目前正在评估Azkaban,因为它支持javaprocess作业类型。但是,Azkaban主要是为Hadoop作业创建的,因此它与Hadoop基础架构,然后是纯javaprocesses更深入的集成。



    赞赏开源或低成本解决方案的一些建议。 >

    解决方案

    听起来像Apache Airflow( https://github.com/apache/incubator-airflow )应该满足您的需求,它现在有一个Dataflow运算符( https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/dataflow_operator.py )。


    We are using Google Dataflow for batch data processing and looking for some options for workflow orchestration tools something similar to what Azkaban does for Hadoop.

    Key things things that we are looking for are,

    • Configuring workflows
    • Scheduling workflows
    • Monitoring and alerting failed workflows
    • Ability to rerun failed jobs

    We have evaluated Pentaho, but these features are available in their Enterprise edition which is expensive. We are currently evaluating Azkaban as it supports javaprocess job types. But Azkaban is primarily created for Hadoop jobs so it has more deep integration with Hadoop infrastructure then plain javaprocesses.

    Appreciate some suggestions for opensource or very low cost solutions.

    解决方案

    It sounds like Apache Airflow (https://github.com/apache/incubator-airflow) should meet your needs and it now has a Dataflow operator (https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/dataflow_operator.py).

    这篇关于Google Dataflow的工作流程编排的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆