Map Reduce:ChainMapper和ChainReducer [英] Map Reduce: ChainMapper and ChainReducer

查看:121
本文介绍了Map Reduce:ChainMapper和ChainReducer的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将Map Reduce jar文件拆分为两个作业,以便获得两个不同的输出文件,一个来自两个作业的每个reducers。



我的意思是第一份工作必须产生一个输出文件,它将成为链中第二份工作的输入。



我读了一些关于ChainMapper和ChainReducer的hadoop版本0.20(目前我使用的是0.18):那些可以满足我的需求?



任何人都可以建议我使用这些方法找到一些示例的链接吗?或者,也许有另一种方式来实现我的问题?



谢谢,



Luca

解决方案

有很多方法可以做到这一点。
$ b


  1. 级联作业



    为第一个作业创建JobConf对象job1,并将所有参数设置为input作为输入目录,temp作为输出目录。执行此作业: JobClient.run(job1)



    紧接着它,创建JobConf对象job2对于第二个作业,并将所有参数设置为temp作为输入目录,输出作为输出目录。执行此任务: JobClient.run(job2)


  2. 两个JobConf对象



    创建两个JobConf对象,并将其中的所有参数设置为(1),除了不使用JobClient.run。



    <然后创建两个以jobconf为参数的Job对象:

    Job job1 = new Job(jobconf1); Job job2 = new Job(jobconf2);



    使用jobControl对象指定作业依赖关系,然后运行作业:

      JobControl jbcntrl = new JobControl(jbcntrl); 
    jbcntrl.addJob(job1);
    jbcntrl.addJob(job2);
    job2.addDependingJob(job1);
    jbcntrl.run();


  3. ChainMapper和ChainReducer

    如果你需要一个像Map + |的结构减少| Map *,您可以使用Hadoop版本0.19及之后的ChainMapper和ChainReducer类。请注意,在这种情况下,您只能使用一个缩减器,而不使用任何数量的映射器。



I need to split my Map Reduce jar file in two jobs in order to get two different output file, one from each reducers of the two jobs.

I mean that the first job has to produce an output file that will be the input for the second job in chain.

I read something about ChainMapper and ChainReducer in hadoop version 0.20 (currently I am using 0.18): those could be good for my needs?

Can anybody suggest me some links where to find some examples in order to use those methods? Or maybe there are another way to achieve my issue?

Thank you,

Luca

解决方案

There are many ways you can do it.

  1. Cascading jobs

    Create the JobConf object "job1" for the first job and set all the parameters with "input" as inputdirectory and "temp" as output directory. Execute this job: JobClient.run(job1).

    Immediately below it, create the JobConf object "job2" for the second job and set all the parameters with "temp" as inputdirectory and "output" as output directory. Execute this job: JobClient.run(job2).

  2. Two JobConf objects

    Create two JobConf objects and set all the parameters in them just like (1) except that you don't use JobClient.run.

    Then create two Job objects with jobconfs as parameters:

    Job job1=new Job(jobconf1); Job job2=new Job(jobconf2);

    Using the jobControl object, you specify the job dependencies and then run the jobs:

    JobControl jbcntrl=new JobControl("jbcntrl");
    jbcntrl.addJob(job1);
    jbcntrl.addJob(job2);
    job2.addDependingJob(job1);
    jbcntrl.run();
    

  3. ChainMapper and ChainReducer

    If you need a structure somewhat like Map+ | Reduce | Map*, you can use the ChainMapper and ChainReducer classes that come with Hadoop version 0.19 and onwards. Note that in this case, you can use only one reducer but any number of mappers before or after it.

这篇关于Map Reduce:ChainMapper和ChainReducer的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆