Map Reduce:ChainMapper 和 ChainReducer [英] Map Reduce: ChainMapper and ChainReducer

查看:24
本文介绍了Map Reduce:ChainMapper 和 ChainReducer的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将我的 Map Reduce jar 文件拆分为两个作业,以便获得两个不同的输出文件,一个来自两个作业的每个 reducer.

I need to split my Map Reduce jar file in two jobs in order to get two different output file, one from each reducers of the two jobs.

我的意思是第一个作业必须生成一个输出文件,该文件将作为链中第二个作业的输入.

I mean that the first job has to produce an output file that will be the input for the second job in chain.

我在 hadoop 0.20 版(目前我使用的是 0.18)中读到了一些关于 ChainMapper 和 ChainReducer 的内容:这些可能对我的需求有好处?

I read something about ChainMapper and ChainReducer in hadoop version 0.20 (currently I am using 0.18): those could be good for my needs?

谁能建议我一些链接,在哪里可以找到一些示例以使用这些方法?或者也许还有其他方法可以解决我的问题?

Can anybody suggest me some links where to find some examples in order to use those methods? Or maybe there are another way to achieve my issue?

谢谢,

卢卡

推荐答案

有很多方法可以做到.

  1. 级联作业

  1. Cascading jobs

为第一个作业创建 JobConf 对象job1",并将所有参数设置为input"作为输入目录,temp"作为输出目录.执行这个作业:JobClient.run(job1).

Create the JobConf object "job1" for the first job and set all the parameters with "input" as inputdirectory and "temp" as output directory. Execute this job: JobClient.run(job1).

在它的正下方,为第二个作业创建 JobConf 对象job2",并将所有参数设置为temp"作为输入目录,output"作为输出目录.执行这个作业:JobClient.run(job2).

Immediately below it, create the JobConf object "job2" for the second job and set all the parameters with "temp" as inputdirectory and "output" as output directory. Execute this job: JobClient.run(job2).

两个 JobConf 对象

Two JobConf objects

创建两个 JobConf 对象,并像(1)一样设置其中的所有参数,只是你不使用 JobClient.run.

Create two JobConf objects and set all the parameters in them just like (1) except that you don't use JobClient.run.

然后创建两个以jobconfs为参数的Job对象:

Then create two Job objects with jobconfs as parameters:

Job job1=new Job(jobconf1);Job job2=new Job(jobconf2);

使用 jobControl 对象,您可以指定作业依赖项,然后运行作业:

Using the jobControl object, you specify the job dependencies and then run the jobs:

JobControl jbcntrl=new JobControl("jbcntrl");
jbcntrl.addJob(job1);
jbcntrl.addJob(job2);
job2.addDependingJob(job1);
jbcntrl.run();

  • ChainMapper 和 ChainReducer

  • ChainMapper and ChainReducer

    如果你需要一个有点像 Map+ 的结构 |减少 |Map*,您可以使用 Hadoop 0.19 及更高版本附带的 ChainMapper 和 ChainReducer 类.请注意,在这种情况下,您只能使用一个 reducer,但可以在它之前或之后使用任意数量的 mapper.

    If you need a structure somewhat like Map+ | Reduce | Map*, you can use the ChainMapper and ChainReducer classes that come with Hadoop version 0.19 and onwards. Note that in this case, you can use only one reducer but any number of mappers before or after it.

    这篇关于Map Reduce:ChainMapper 和 ChainReducer的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆