运行hadoop作业后生成空输出文件 [英] Empty output file generated after running hadoop job

查看:561
本文介绍了运行hadoop作业后生成空输出文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下的MapReduce程序:

I have a MapReduce program as below

import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.KeyValueTextInputFormat;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextOutputFormat;

public class Sample {

public static class SampleMapper extends MapReduceBase implements
        Mapper<Text, Text, Text, Text> {

    private Text word = new Text();

    @Override
    public void map(Text key, Text value,
            OutputCollector<Text, Text> output, Reporter reporter)
            throws IOException {
        StringTokenizer itr = new StringTokenizer(value.toString(),",");
        while (itr.hasMoreTokens()) {
            word.set(itr.nextToken());
            output.collect(key, word);
        }
    }
}

public static class SampleReducer extends MapReduceBase implements
        Reducer<Text, Text, Text, Text> {

    private Text result = new Text();

    @Override
    public void reduce(Text key, Iterator<Text> values,
            OutputCollector<Text, Text> output, Reporter reporter)
            throws IOException {

        StringBuffer aggregation = new StringBuffer();
        while (values.hasNext()) {
            aggregation.append("|" + values.next().toString());
        }
        result.set(aggregation.toString());
        output.collect(key, result);
    }

}

public static void main(String args[]) throws IOException {
    JobConf conf = new JobConf(Sample.class);
    conf.setJobName("Sample");

    conf.setMapperClass(SampleMapper.class);
    conf.setReducerClass(SampleReducer.class);
    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(Text.class);
    conf.setInputFormat(KeyValueTextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);
    FileInputFormat.setInputPaths(conf, new Path(args[0]));
    FileOutputFormat.setOutputPath(conf, new Path(args[1]));

    JobClient.runJob(conf);

}
}

我制作了jar和我一直在努力获得输出。但是正在创建的输出文件是空的。

I've made the jar and I've been to trying to get the output. But the output file which is being created is empty.

我使用以下命令来执行作业

I'm using the following command to run the job

hadoop jar mapreduce.jar Sample /tmp/input tmp/output

mapreduce.jar是我的jar已打包并且我的输入文件就像

mapreduce.jar is the jar which I have packaged and my input file is like

1 a,b,c
2 e,f
1 x,y,z
2 g

预期产量

expected output

1 a|b|c|x|y|z
2 e|f|g


推荐答案

我在猜测,因为您使用KeyValueTextInputFormat作为输入格式,所以它没有找到分隔符字节因此使用整行值作为键(值为)。这意味着你在mapper中的迭代不会经历任何循环,也不会写出任何内容。在配置中使用属性名称mapreduce.input.keyvaluelinerecordreader.key.value.separator来保存作为分隔符字节。

I'm guessing that since you're using KeyValueTextInputFormat as the input format that it's not finding a separater byte and is therefore using the entire line value as the key (value is ""). That would mean that your iteration in the mapper doesn't go through any loops and nothing is written out. Use property name mapreduce.input.keyvaluelinerecordreader.key.value.separator in the config to hold " " as the separator byte.

这篇关于运行hadoop作业后生成空输出文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆