MapReduce WordCount程序 - 输出与输入文件相同 [英] MapReduce WordCount Program - output is same as the input file
问题描述
我为映射类和
Reducer< Text,IntWritable,Text,IntWritable>扩展了Mapper< LongWritable,Text,Text,IntWritable>
; Reducer类的。 这是我的代码
driver.java
public class driver extends Configured implements Tool {public int run(String [] args)throws Exception {Configuration conf = new Configuration();工作职位=新职位(conf,wordcount); job.setMapperClass(mapper.class); job.setReducerClass(reducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.setInputFormatClass(KeyValueTextInputFormat.class); FileInputFormat.addInputPath(job,new Path(args [0])); FileOutputFormat.setOutputPath(job,new Path(args [1])); job.waitForCompletion(真); //JobClient.runJob((JobConf)conf); //System.exit(job.waitForCompletion(true)?0:1);返回0; } public static void main(String [] args)throws Exception {long start = System.currentTimeMillis();} // int res = ToolRunner.run(new Configuration(),new driver(),args); int res = ToolRunner.run(new Configuration(),new driver(),args); long stop = System.currentTimeMillis(); System.out.println(Time:+(stop-start)); System.exit(RES); }}
$ b
mapper.java
div class =snippetdata-lang =jsdata-hide =falsedata-console =falsedata-babel =false>
public class reducer extends Reducer< Text,IntWritable,Text,IntWritable> {// reduce方法接受来自mappers的Key Value对, (Text键,Iterator< IntWritable>值,OutputCollector< Text,IntWritable>输出,Reporter记者)抛出IOException {int sum = 0; while(values.hasNext()){sum + = values.next()。get(); } output.collect(key,new IntWritable(sum)); }}
被新的& MapReduce的旧API。我认为你试图在新的API中编写WordCount程序,但是从旧的API(旧的博客文章也许)中摘录了一些片段。如果您只是将
@override
注释添加到地图和放大器中,您可以自己找到问题。减少方法。 看看它们在进化之后会发生什么:
The output I am expecting is the count of every word in the input file. But my output is the whole input file, as it is.
I am using extends Mapper<LongWritable, Text, Text, IntWritable>
for mapper class and Reducer<Text, IntWritable, Text, IntWritable>
for reducer class.
Here is my code
driver.java
public class driver extends Configured implements Tool{
public int run(String[] args) throws Exception
{
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setMapperClass(mapper.class);
job.setReducerClass(reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(KeyValueTextInputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
//JobClient.runJob((JobConf) conf);
//System.exit(job.waitForCompletion(true) ? 0 : 1);
return 0;
}
public static void main(String[] args) throws Exception
{
long start = System.currentTimeMillis();
//int res = ToolRunner.run(new Configuration(), new driver(),args);
int res = ToolRunner.run(new Configuration(), new driver(),args);
long stop = System.currentTimeMillis();
System.out.println ("Time: " + (stop-start));
System.exit(res);
}
}
mapper.java
public class mapper extends Mapper<LongWritable, Text, Text, IntWritable>
{
//hadoop supported data types
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
//map method that performs the tokenizer job and framing the initial key value pairs
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
reducer.java
public class reducer extends Reducer<Text, IntWritable, Text, IntWritable>
{
//reduce method accepts the Key Value pairs from mappers, do the aggregation based on keys and produce the final out put
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
{
int sum = 0;
while (values.hasNext())
{
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
You are perplexed by the new & old APIs of MapReduce. I think you tried to write WordCount program in new API, but took snippets from the old API(a old blogpost perhaps). You can find the problem yourself, if you just add @override
annotation to both the map & reduce methods.
See what happens to them after evolution :
You just wrote two new methods specifying older signature, so they just don't override anything, nowhere being called. The code is doing nothing since the actual methods being called have empty bodies(I don't think there is a default implementation and if there is that will be identity operations only).
Anyway, you should follow basic conventions for coding.
这篇关于MapReduce WordCount程序 - 输出与输入文件相同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!