在Hadoop流中链接多个mapreduce任务 [英] Chaining multiple mapreduce tasks in Hadoop streaming

查看:217
本文介绍了在Hadoop流中链接多个mapreduce任务的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我处于有两个mapreduce作业的场景。我更喜欢python,并计划用它来编写mapreduce脚本,并使用hadoop流式传输。在使用hadoop流式传输的时候,有没有一种方便的方式来链接下面的表单?

Map1 - > Reduce1 - > Map2 - > Reduce2



我在java中听到很多方法来完成这个任务,但是我需要一些Hadoop流。

解决方案

这是一篇关于如何使用Cascading和Streaming的博文。
http:// www.xcombinator.com/2009/11/18/how-to-use-cascading-with-hadoop-streaming/



这里的价值是你可以在同一个应用程序中将Java(级联查询流)与您的自定义流式操作混合使用。注意,Cascading中的Cascade对象允许你链接多个Flow(通过上面的博客文章,你的Streaming工作将成为一个流程) MapReduceFlow)。



免责声明:我是Cascading的作者


I am in scenario where I have two mapreduce jobs. I am more comfortable with python and planning to use it for writing mapreduce scripts and use hadoop streaming for the same. is there a convenient to chain both the jobs following form when hadoop streaming is used?

Map1 -> Reduce1 -> Map2 -> Reduce2

I've heard a lot of methods to accomplish this in java, But i need something for Hadoop streaming.

解决方案

Here is a great blog post on how to use Cascading and Streaming. http://www.xcombinator.com/2009/11/18/how-to-use-cascading-with-hadoop-streaming/

The value here is you can mix java (Cascading query flows) with your custom streaming operations in the same app. I find this much less brittle than other methods.

Note, the Cascade object in Cascading allows you to chain multiple Flows (via the above blog post your Streaming job would become a MapReduceFlow).

Disclaimer: I'm the author of Cascading

这篇关于在Hadoop流中链接多个mapreduce任务的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆