多输出路径(Java - Hadoop - MapReduce) [英] Multiple output path (Java - Hadoop - MapReduce)

查看:130
本文介绍了多输出路径(Java - Hadoop - MapReduce)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我做了两个MapReduce作业,我希望第二个作业能够将结果写入两个不同的文件中,分别放在两个不同的目录中。
我想在某种意义上类似于FileInputFormat.addInputPath(..,多输入路径),但输出。



我是全新的到MapReduce,我有一个特殊的写在我的代码在Hadoop 0.21.0
我在我的Reduce步骤中使用 context.write(..),但我没有看到如何控制多个输出路径...



感谢您的时间!



我的reduceCode从我的第一份工作,告诉你我只知道如何输出(它进入一个/../part*文件,但现在我想能够为不同的输出指定两个精度文件,具体取决于键):

  public static class NormalizeReducer扩展了Reducer< LongWritable,NetflixRating,LongWritable,NetflixUser> {
public void reduce(LongWritable key,Iterable< NetflixRating> values,Context context)throws IOException,InterruptedException {
NetflixUser user = new NetflixUser(key.get());
(NetflixRating r:values){
user.addRating(new NetflixRating(r));
}
user.normalizeRatings();
user.reduceRatings();
context.write(key,user);


$ / code $ / pre

$ hr

编辑:所以我在最后一个评论中做了这个方法,就像你提到的那样,Amar。我不知道它是否有效,我的HDFS还有其他问题,但在我忘记之前,为了文明的目的,我们将这些发现放在这里:

http:


解决方案

所以我在你提到的最后一个评论中做了这个方法,Amar。我不知道它是否有效,我的HDFS还有其他问题,但在我忘记之前,为了文明的目的,我们将这些发现放在这里:

http:

MultipleOutputs不会代替FormatOutputFormat。您可以使用FormatOutputFormat定义一个输出路径,然后可以使用多个MultipleOutput来添加更多输出路径。
addNamedOutput方法:字符串namedOutput只是一个描述的单词。
您可以在write方法中实际定义路径,即字符串baseOutputPath arg。


I do two MapReduce job, and I want for the second job to be able to write my result into two different files, in two different directories. I would like something similar to FileInputFormat.addInputPath(.., multiple input path) in a sense, but for the output.

I'm completely new to MapReduce, and I have a specificity to write my code in Hadoop 0.21.0 I use context.write(..) in my Reduce step, but I don't see how to control multiple output paths...

Thanks for your time !

My reduceCode from my first job, to show you I only know how to output (it goes into a /../part* file. But now what I would like is to be able to specify two precises files for different output, depending on the key) :

public static class NormalizeReducer extends Reducer<LongWritable, NetflixRating, LongWritable, NetflixUser> {
    public void reduce(LongWritable key, Iterable<NetflixRating> values, Context context) throws IOException, InterruptedException {
        NetflixUser user = new NetflixUser(key.get());
        for(NetflixRating r : values) {
            user.addRating(new NetflixRating(r));
        }
        user.normalizeRatings();
        user.reduceRatings();
        context.write(key, user);
    }
}


EDIT: so I did the method in the last comment as you mentioned, Amar. I don't know if it's works, I have other problem with my HDFS, but before I forget let's put here my discoveries for the sake of civilization :

http://archive.cloudera.com/cdh/3/hadoop-0.20.2+228/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html

  • MultipleOutputs DOES NOT act in place of FormatOutputFormat. You define one output path with FormatOutputFormat, and then you can add many more with multiple MultipleOutputs.
  • addNamedOutput method: String namedOutput is just a word who describe.
  • You define the path actually in the write method, the String baseOutputPath arg.

解决方案

so I did the method in the last comment as you mentioned, Amar. I don't know if it's works, I have other problem with my HDFS, but before I forget let's put here my discoveries for the sake of civilization :

http://archive.cloudera.com/cdh/3/hadoop-0.20.2+228/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html

MultipleOutputs DOES NOT act in place of FormatOutputFormat. You define one output path with FormatOutputFormat, and then you can add many more with multiple MultipleOutputs. addNamedOutput method: String namedOutput is just a word who describe. You define the path actually in the write method, the String baseOutputPath arg.

这篇关于多输出路径(Java - Hadoop - MapReduce)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆