安排Amazon Elastic MapReduce作业的工具/方式 [英] Tool/Ways to schedule Amazon's Elastic MapReduce jobs

查看:120
本文介绍了安排Amazon Elastic MapReduce作业的工具/方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用EMR创建新实例并处理作业,然后关闭实例.

I use EMR to create new instances and process the jobs and then shutdown instances.

我的要求是定期安排工作.一种简单的实现方法是使用石英来触发EMR作业.但是,从更长远的角度来看,我对使用开箱即用的mapreduce调度解决方案感兴趣.我的问题是,我可以使用EMR或AWS-SDK提供的任何现成的计划功能吗?我可以看到在自动缩放中有计划,但是我想改为计划EMR工作流程.

My requirement is to schedule jobs in periodic fashion. One of the easy implementation can be to use quartz to trigger EMR jobs. But looking at longer run I am interested in using out of box mapreduce scheduling solution. My question is that is there any out of box scheduling feature provided by EMR or AWS-SDK, which i can use for my requirement? I can see there is scheduling in Auto scaling, but i want to schedule EMR jobflow instead.

推荐答案

适用于Hadoop的Apache Oozie工作流计划程序为此.

Oozie是用于管理Apache Hadoop作业的工作流调度程序系统.

Oozie is a workflow scheduler system to manage Apache Hadoop jobs.

Oozie Workflow作业是操作的有向无环图(DAG).

Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.

Oozie Coordinator作业是由以下人员触发的周期性Oozie Workflow作业 时间(频率)和数据可用性.

Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availabilty.

Oozie与其他Hadoop堆栈集成在一起,支持 开箱即用的几种类型的Hadoop作业(例如Java map-reduce, 流式Map-Reduce,Pig,Hive,Sqoop和Distcp)以及系统 特定的作业(例如Java程序和Shell脚本).

Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

Oozie是一个可扩展,可靠且可扩展的系统.

Oozie is a scalable, reliable and extensible system.

这是用于配置apache oozie的Elastic Map Reduce引导操作的简单示例: https://github.com/lila/emr-oozie-sample

Here is a simple example of Elastic Map Reduce bootstrap actions for configuring apache oozie : https://github.com/lila/emr-oozie-sample

但是,要让您知道oozie有点复杂,并且仅当您要安排/监视/维护许多工作时,才可以使用oozie,否则只需创建一堆cron如果您只说要定期安排2或3个工作,则该工作.

But to let you know oozie is a bit complicated and if and only if you have a lot of jobs to be scheduled/monitored/maintained then only you shall go for oozie or else just create a bunch of cron jobs if you have say just 2 or 3 jobs to be scheduled periodically.

您还可以研究和探索来自Amazon的简单工作流程.

这篇关于安排Amazon Elastic MapReduce作业的工具/方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆