MapReduce WordCount程序 - 输出与输入文件相同 [英] MapReduce WordCount Program - output is same as the input file

查看：181 发布时间：2018/5/31 19:26:43 hadoop mapreduce

本文介绍了MapReduce WordCount程序 - 输出与输入文件相同的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我期望的输出是输入文件中每个单词的计数。但我的输出是整个输入文件，就像它一样。
我为映射类和 Reducer< Text，IntWritable，Text，IntWritable>扩展了Mapper< LongWritable，Text，Text，IntWritable> ; Reducer类的。
这是我的代码

driver.java

  public class driver extends Configured implements Tool {public int run（String [] args）throws Exception {Configuration conf = new Configuration（）;工作职位=新职位（conf，wordcount）; job.setMapperClass（mapper.class）; job.setReducerClass（reducer.class）; job.setOutputKeyClass（Text.class）; job.setOutputValueClass（Text.class）; job.setInputFormatClass（KeyValueTextInputFormat.class）; FileInputFormat.addInputPath（job，new Path（args [0]））; FileOutputFormat.setOutputPath（job，new Path（args [1]））; job.waitForCompletion（真）; //JobClient.runJob((JobConf）conf）; //System.exit(job.waitForCompletion(true）？0：1）;返回0; } public static void main（String [] args）throws Exception {long start = System.currentTimeMillis（）;} // int res = ToolRunner.run（new Configuration（），new driver（），args）; int res = ToolRunner.run（new Configuration（），new driver（），args）; long stop = System.currentTimeMillis（）; System.out.println（Time：+（stop-start））; System.exit（RES）; }}

$ b

mapper.java

公共类映射器扩展映射器< LongWritable，Text，Text，IntWritable> {/ /编辑器代码
reducer.java

div class =snippetdata-lang =jsdata-hide =falsedata-console =falsedata-babel =false>

public class reducer extends Reducer< Text，IntWritable，Text，IntWritable> {// reduce方法接受来自mappers的Key Value对，（Text键，Iterator< IntWritable>值，OutputCollector< Text，IntWritable>输出，Reporter记者）抛出IOException {int sum = 0; while（values.hasNext（））{sum + = values.next（）。get（）; } output.collect（key，new IntWritable（sum））; }}

被新的& MapReduce的旧API。我认为你试图在新的API中编写WordCount程序，但是从旧的API（旧的博客文章也许）中摘录了一些片段。如果您只是将 @override 注释添加到地图和放大器中，您可以自己找到问题。减少方法。

看看它们在进化之后会发生什么：

map

减少
没有被称为。代码没有做任何事情，因为被调用的实际方法有空体（我认为没有默认实现，并且如果只有身份操作）。

无论如何，你应该遵循编码的基本约定。
The output I am expecting is the count of every word in the input file. But my output is the whole input file, as it is. I am using extends Mapper<LongWritable, Text, Text, IntWritable> for mapper class and Reducer<Text, IntWritable, Text, IntWritable> for reducer class. Here is my code

driver.java

public class driver extends Configured implements Tool{ public int run(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setMapperClass(mapper.class); job.setReducerClass(reducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.setInputFormatClass(KeyValueTextInputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); //JobClient.runJob((JobConf) conf); //System.exit(job.waitForCompletion(true) ? 0 : 1); return 0; } public static void main(String[] args) throws Exception { long start = System.currentTimeMillis(); //int res = ToolRunner.run(new Configuration(), new driver(),args); int res = ToolRunner.run(new Configuration(), new driver(),args); long stop = System.currentTimeMillis(); System.out.println ("Time: " + (stop-start)); System.exit(res); } }

mapper.java

public class mapper extends Mapper<LongWritable, Text, Text, IntWritable> { //hadoop supported data types private final static IntWritable one = new IntWritable(1); private Text word = new Text(); //map method that performs the tokenizer job and framing the initial key value pairs public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } }

reducer.java

public class reducer extends Reducer<Text, IntWritable, Text, IntWritable> { //reduce method accepts the Key Value pairs from mappers, do the aggregation based on keys and produce the final out put public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } }

解决方案
You are perplexed by the new & old APIs of MapReduce. I think you tried to write WordCount program in new API, but took snippets from the old API(a old blogpost perhaps). You can find the problem yourself, if you just add @override annotation to both the map & reduce methods.

See what happens to them after evolution :

map

reduce

You just wrote two new methods specifying older signature, so they just don't override anything, nowhere being called. The code is doing nothing since the actual methods being called have empty bodies(I don't think there is a default implementation and if there is that will be identity operations only).

Anyway, you should follow basic conventions for coding.

这篇关于MapReduce WordCount程序 - 输出与输入文件相同的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

MapReduce WordCount程序 - 输出与输入文件相同 [英] MapReduce WordCount Program - output is same as the input file

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

MapReduce WordCount程序 - 输出与输入文件相同 [英] MapReduce WordCount Program - output is same as the input file

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭