多个reducer如何在Hadoop中输出一个部分文件? [英] How do multiple reducers output only one part-file in Hadoop?

查看:106
本文介绍了多个reducer如何在Hadoop中输出一个部分文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的map-reduce工作中,我使用4个reducer来实现reducer工作。因此,通过这样做,最终的输出将生成4个部分文件:part-0000 part-0001 part-0002 part-0003



我的问题是如何设置hadoop的配置只能输出一个部分文件,尽管hadoop使用4个reducer来工作吗?

解决方案

从hadoop预期的行为。但是你可以在这里使用 MultipleOutputs
创建一个命名输出,并在所有缩减器中使用该输出,以便在一个文件中获得最终输出。它是javadoc本身的建议如下:

$ $ $ $ $ $ $ $> JobConf conf = new JobConf();

conf.setInputPath(inDir);
FileOutputFormat.setOutputPath(conf,outDir);

conf.setMapperClass(MOMap.class);
conf.setReducerClass(MOReduce.class);
...

//为作业定义额外的单个文本输出'text'
MultipleOutputs.addNamedOutput(conf,text,TextOutputFormat.class,
LongWritable.class,Text.class);;
...

JobClient jc = new JobClient();
RunningJob job = jc.submitJob(conf);

...

作业配置使用模式为:

  public class MOReduce实现
Reducer< WritableComparable,Writable> {
私有MultipleOutputs mos;

public void configure(JobConf conf){
...
mos = new MultipleOutputs(conf);

$ b $ public void reduce(WritableComparable key,Iterator< Writable> values,
OutputCollector output,Reporter reporter)
throws IOException {
...
mos.getCollector(text,reporter).collect(key,new Text(Hello));
...
}

public void close()throws IOException {
mos.close();
...
}

}

如果您正在使用新的 mapreduce API,请参阅这里


In my map-reduce job, I use 4 reducers to implement the reducer jobs. So by doing this, the final output will generate 4 part-files.: part-0000 part-0001 part-0002 part-0003

My question is how can I set the configuration of hadoop to output only one part-file, although the hadoop use 4 reducers to work?

解决方案

This isn't the behaviour expected from hadoop. But you may use MultipleOutputs to your advantage here. Create one named output and use that in all your reducers to get the final output in one file itself. It's javadoc itself suggest the following:

 JobConf conf = new JobConf();

 conf.setInputPath(inDir);
 FileOutputFormat.setOutputPath(conf, outDir);

 conf.setMapperClass(MOMap.class);
 conf.setReducerClass(MOReduce.class);
 ...

 // Defines additional single text based output 'text' for the job
 MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class,
 LongWritable.class, Text.class);;
 ...

 JobClient jc = new JobClient();
 RunningJob job = jc.submitJob(conf);

 ...

Job configuration usage pattern is:

public class MOReduce implements
   Reducer<WritableComparable, Writable> {
 private MultipleOutputs mos;

 public void configure(JobConf conf) {
 ...
 mos = new MultipleOutputs(conf);
 }

 public void reduce(WritableComparable key, Iterator<Writable> values,
 OutputCollector output, Reporter reporter)
 throws IOException {
 ...
 mos.getCollector("text", reporter).collect(key, new Text("Hello"));
 ...
 }

 public void close() throws IOException {
 mos.close();
 ...
 }

 }

If you are using the new mapreduce API then see here.

这篇关于多个reducer如何在Hadoop中输出一个部分文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆