定期hadoop作业运行(最佳实践) [英] Periodic hadoop jobs running (best practice)

查看:155
本文介绍了定期hadoop作业运行(最佳实践)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

能够随时将网址上传到数据库和应用程序的客户应该尽快处理网址。所以我需要定期的hadoop作业运行或从其他应用程序自动运行hadoop作业(任何脚本标识新链接添加,生成hadoop作业和运行作业的数据)。对于PHP或Python脚本,我可以设置cronjob,但什么是周期性的hadoop作业的最佳实践(准备数据为hadoop,上传数据,运行hadoop作业和移动数据回数据库?

Customers able to upload urls in any time to database and application should processes urls as soon as possible. So i need periodic hadoop jobs running or run hadoop job automatically from other application(any script identifies new links were added, generates data for hadoop job and runs job). For PHP or Python script, i could set up cronjob, but what is best practice for periodic hadoop jobs running (prepare data for hadoop, upload data, run hadoop job and move data back to database?

a> a

看看Oozie,来自Y!的新工作流系统可以根据不同的触发器运行作业Alejandro在这里提供了一个好的溢出: http://www.slideshare.net/ydn/5-oozie-hadoopsummit2010

Take a look at Oozie, the new workflow system from Y!, which can run jobs based on different triggers. A good overflow is presented by Alejandro here: http://www.slideshare.net/ydn/5-oozie-hadoopsummit2010

这篇关于定期hadoop作业运行(最佳实践)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆