hadoop mapreduce中单独的输出文件 [英] Separate output files in hadoop mapreduce
问题描述
我的问题可能已经被问到了,但我找不到明确的答案。
我的MapReduce是一个基本的WordCount。我当前的输出文件是:
//文件名:'part-r-00000'
789 a
755 #c
456 d
123 #b
如何更改输出文件名?
然后,是否可以有2个输出文件:
//第一个输出文件
789 a
456 d
//第二个输出文件
123 #b
755 #c
这是我的reduce类:
public static class SortReducer extends Reducer< IntWritable,Text,IntWritable,Text> {
public void reduce(IntWritable key,Text value,Context context)throws IOException,InterruptedException {
context.write(key,value);
这是我的分区类: p>
public class TweetPartitionner扩展了Partitioner< Text,IntWritable> {
@Override
public int getPartition (Text a_key,IntWritable a_value,int a_nbPartitions){
if(a_key.toString()。startsWith(#))
return 1;
返回0;
}
}
非常感谢!
解决方案在您的工作文件集中
job.setNumReduceTasks(2);
从映射器发射
a 789
#c 755
d 456
#b 123
编写分区程序,将分区程序添加到作业配置中,在分区程序中检查密钥是否以#开头返回1 else 0
在reducer交换键和值
My question has probably already been asked but I can not find a clear answer to my question.
My MapReduce is a basic WordCount. My current output file is :
// filename : 'part-r-00000'
789 a
755 #c
456 d
123 #b
How can I change the ouput filename ?
Then, is-it possible to have 2 output files :
// First output file
789 a
456 d
// Second output file
123 #b
755 #c
Here's my reduce class :
public static class SortReducer extends Reducer<IntWritable, Text, IntWritable, Text> {
public void reduce(IntWritable key, Text value, Context context) throws IOException, InterruptedException {
context.write(key, value);
}
}
Here's my Partitionner Class :
public class TweetPartitionner extends Partitioner<Text, IntWritable>{
@Override
public int getPartition(Text a_key, IntWritable a_value, int a_nbPartitions) {
if(a_key.toString().startsWith("#"))
return 1;
return 0;
}
}
Thanks a lot !
解决方案 In your job file set
job.setNumReduceTasks(2);
From mapper emit
a 789
#c 755
d 456
#b 123
write a partitioner, add partitioner to job config, In partitioner check if key starts with # return 1 else 0
in reducer swap key and value
这篇关于hadoop mapreduce中单独的输出文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!