hadoop中的多个映射器 [英] Multiple mappers in hadoop

查看:97
本文介绍了hadoop中的多个映射器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在一个hadoop程序中使用一个作业在同一个输入文件上运行2个独立的映射器。我想要两个映射器的输出进入一个简化器。我遇到了运行多个映射器的问题。我正在使用MultipleInputs类。通过运行这两个映射器,它工作正常,但昨天我注意到它只运行一个映射函数,这是第二个MultipleInputs语句似乎覆盖了第一个映射函数。我没有发现任何修改代码来突然显示这种不同的行为:(请帮助我在这..主要功能是:

 <$ c $ b $ Configuration Configuration conf = new Configuration(); 
Job job = new Job(conf,mapper同时接受整个文件); $ b $ public static void main(String [] args)throws Exception {
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
job.setJarByClass(TestMultipleInputs.class);
job.setMapperClass(Map2.class );
job.setMapperClass(Map1.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(NLinesInputFormat.class);
job.setOutputFormatClass(TextOutputFormat .class);
job.setMapOutputKeyClass(IntWritable.class);
** MultipleInputs.addInputPath(job,new Path(Rec),NLinesInputFormat.class,Map1.clas或多个);
MultipleInputs.addInputPath(job,new Path(Rec),NLinesInputFormat.class,Map2.class); **
FileOutputFormat.setOutputPath(job,new Path(testMulinput));
job.waitForCompletion(true);
}

最后一个MultipleInputs语句中使用的Map类被执行。就像在这里Map2.class得到执行一样。

解决方案

两个mappers不能同时读取同一个文件。

解决方案(解决方法):
创建输入文件的副本(在这种情况下,让rec rec文件为rec1)。然后用rec给mapper1和rec1给mapper2。

这两个mapper都是并行执行的,所以你不必担心reducer的输出,因为mappers的输出都会被混洗,所以来自两个文件的相同密钥都转到相同的缩减器。



所以输出就是你想要的。



希望这有助于其他面临类似问题的人。

p>

I am trying to run 2 independent mappers on the same input file in a hadoop program using one job. I want the output of both the mappers to go into a single reducer. I face an issue with running multiple mappers. I was using MultipleInputs class. It was working fine by running both the mappers but yesterday i noticed that it is running only one map function that is the second MultipleInputs statement seems to overwrite the first one. I dont find any change done to the code to show this different behavior suddenly :( Please help me in this.. The main function is :

    public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            Job job = new Job(conf, "mapper accepting whole file at once");          
            job.setOutputKeyClass(IntWritable.class);
            job.setOutputValueClass(IntWritable.class);
            job.setJarByClass(TestMultipleInputs.class);
            job.setMapperClass(Map2.class);    
            job.setMapperClass(Map1.class);  
            job.setReducerClass(Reduce.class);  
            job.setInputFormatClass(NLinesInputFormat.class);
            job.setOutputFormatClass(TextOutputFormat.class);
            job.setMapOutputKeyClass(IntWritable.class);
**  MultipleInputs.addInputPath(job, new Path("Rec"), NLinesInputFormat.class, Map1.class);
    MultipleInputs.addInputPath(job, new Path("Rec"), NLinesInputFormat.class, Map2.class);**    
            FileOutputFormat.setOutputPath(job,new Path("testMulinput"));
              job.waitForCompletion(true);
}

Whichever Map class is used in the last MultipleInputs statement gets executed. Like in here the Map2.class gets executed.

解决方案

Both mappers can't read the same file at the same time.

Solution (Workaround): Create a duplicate of the input file (in this case let duplicate rec file be rec1). Then feed mapper1 with rec and mapper2 with rec1.

Both mappers are executed parallel so you don't need to worry about reducer output because both mappers output will be shuffled so that equal keys from both the files go to same reducer.

so output is what you want.

Hope this helps to others who are facing similar issue.

这篇关于hadoop中的多个映射器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆